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EX-OP» System with Tn^m^rl Tlr'«g Contrfti Pftge 1 

PAglTAL WAlVEgt OP OOPraiGHT 
An of the materifd in this patent ^licatioo b suljeet to copyright 
protection under the copyriglit laws of the United Kingdom, the United States* 
and <^ other countries. As of the first effective fifing date of the present 

However, permisaion to copy this material ia herefagr granted to the 
extent that the c o p yr ig ht owner has no otdection to the flirslmne reproduction 
by aoyone of the patent document or patent disclosure^ as it appears in 
ofiQdal patent file or records of the United Kingdom or any other country* but 
10 otherwise reserves aH copyright rights whatsoever. 

BACKGROTOTO OF THE INVENTION 
Tlie present inventioa relates to *^**^r*^^ lysteou end subflystems^ and 
to computer-based methods fbr datn ] 



15 TTtyty^qpiNid MtiltiprQ«>«niii> Arrhitjirttii^ 

It has long been realised that the use of multiple pr o cea sof s operating 
in psraiki ndgbt ki princq)la be a vety convenient way to sdneve ^rery high 
net throughput. Many audi arcfaiteetures haevo been p ropoe c di However* the 
actual fi^aHtatinn of su^ architectures is very dffieulL In partieulsr* it 

20 is difficult to design an architecture of this kind which wiD be versatile 
enough to satlsiy a range cf users and adapt to advances in tecfanologf. 

Pidl^ as J nrhnffiffUff ni*rtMrT"'*"Tff9T architecttires have been propose^ but 
it is generaQf reeoptod. in the srt that the probiema of propanunlng support' 
in a multipffocessor architecture have not near^ been solved* 

25 A vety recent o verv iew of some cf the issues In v ol ve d in multiprocessor 

systems SMy he Ibuad in Dubois at sL, *SynchrooiaaftioQ» Cohsrenoe^ and Event 
Ordering hi Mutt^proesaonb* frmfflrtTT nwgasineb Pehruaiy 1088; page 9 

A recent^ propoeed pwdtipr o ceBaof 
ardiitecture for digital atrial proce ss n ig Is described h& et aL. "An 

30 Optimum Psr^ Ardutecture for Hish-Speed BeaMtee Digital Si0Dal 



I II ^11^ Tratom with Improved Timing CQntfoj|p EfigftJl 

Proceflangr* Computop magBLiliM, February 1988, pago 47 



Inter-Procesaor SynehrontTiition 
5 SyndiroEnisatioa between proceMors is a cootiDuing critkal Isma in 

a very wide variety of muhtproceaaor iystema. Oa«n such inter-proeassor 
interfiMea maka usa of 'proceaaor-waiftiii^ or ' p roce aa or-rcady* atatus aig- 
nab which can ba set or cleared tagr ehher proeeaaor. (Such aigoata ara 
commozity known aa 'semapborea.') 

10 I/ft M^'T^gwnffnt 

Hie use of a aeparate I/O prooeeaor, in adfidon to an intar&ca 
eootroHer and at least one other prooeaaor, haa loQg been known in supercom- 
puter and mainframa flystema. Such ^Tstems win often use a variety of 
dedicated pr occ aa ora to handle various spedattaed taafcsi 
15 More reoent^f, such I/O pr oce aao ara hacva seen uaa kk board-level numeric 

pmesamg subq^slema. For exan^le. Mereuiy Computer 37stem8 haa announeed a 
system (2p32320 whkh appears to have a separato daCa-tranafer processor, 
running asynehrooouatf to tha floating-point 



20 prn^^ ^'^'^'^^^^ 

A variety of wpwHaHwid p to cc aaor arddteetura% which are partieutar- 
\j useftti for particular i^pficatioos^ have been propoaed. However, much leaa 
attention haa been paid Co how auch speriaWied areUtecturee ean be 
integrated in a lyiiiiial jmrjinao n*«— procaaoing ardiitecture* Ajao^ rela*' 

25 tive^ fittle sl^m^ hm been paid to geoetafised interfece management for 
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PX-«fffr, avtiim with Tm^.,^ J^l^^ g^j^g^j 

8TTMIIAIIY OF THB MVBiiTrQH 
TIm pmeni a(]plkatioQ provides a Iwgft munlM of mnovBtlve 
t ^'tM^mig* . wfaldi win be deac ri bed in the soneral omtexft oT a aystem Uke that 
ah0«m in Figure I. 

5 Aoung the fnnovative teaefaiBfli set fbrth hsreia a flyatem wbere two 

pffocciso i l are both clocked hf a shaied variabls-duratioo dock. On the 
wnhndlHKftut of Pfgure 1» it is the control p rocs ss or 110 and the data-trantfer 
p ro c esso r 120 whfefa share a dock in this fiofakaL) the control 

processor 110 and the data traute proeessor 120 ar« enabled to nm 

10 ^mfaroooushr, even though they are oooeurren^f nmnfaiff sepmte streams of 
inatruetioas. Obviou^, wbtn one p to oc ss ut has requested a shorter c^de 

duratkQp that § wsm will lose some e ffl e i eniy when the dock is 

t emp or ari ly alowed to prodoee the ton^sr duratiatt requested bf the other 
pro c e sso r . However, this meflfctency is nnior, beoause the mi||oritj of 

Ifi inatnaelkns will speclQr the shortest qrde tone. 

Tlnii^ hi the presentif p re tei e d — ths contrd p ro ees sor 110 
and ths data tranafier prnrmssnr 120 are enabled to run qmehrooointf t even 
thou^ they are concotrent^ nmnk« sepMta strem of imrtnrtliTni 
Obviooslf . when cos pr o ce siot has requested a diorter cyde dmtte, that 

20 pr o c essor wil kioe some effidsnear when ths cM Is taivorarOf alowed to 
prodnee the longer duraftkm requealed by the other proceaacr. H o wev er, thk 
limfllplsiry Is askior, beonise the aayori^ of In s tnrtfa M wiD aped^ the 
shortest cyds 

In adifitloQ, a ftvthsr level of srb teatfa n Is siso provUsd. the 
25 contrd lauie ss ui 110 la ghw ptteily on seosss to ths caehe memoty 14a That 
h^ the data tranate pr oo a aa ot 120 must check betee every cscfae aeecsa^ to 

However, to pr ov e n! lock-out, the data trsnste p roce ss or 120 hM an kitarrupt 
atgaal avaflsUs to H, which vpffl oonsnd ths contrd proceesor 110 to 
SO releaas oontrd of ths cschs port 143 fiv at knot one cyds. 

A fbrther Inn ov ati ve iwhing herekt Is a new architecture (bsasd on a 
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amm wMi fammwl TImlfMr-gattttai E&eLl 

novel iinorat too of logie fiunOiea) In l^glMpeed numerio pcoceaaof. la the 
sample mhrwBmwit of Figum 4\ throi^ 4D, scaae vvty sparing uae is of 
ECL, in a TTL boerd arcfaitacUire. ECL im uat^ in thr<!» hnp^^^t ptmseai 

the dock on the hIglMpeed numerie latteutotion module la a 
6 -smart" dock, using ECL logk; wlifcsh a^iMto the cloc^ duntlona c^de-by. 
(9da. in acecg dan oe wiUk the partieukr Inatnactkos beli« esecuted by the 
arithmetic cidculatlon 

an ECL tranofor dock is uaed to provide a transttkoal dock 
domain which is qi i s il nynfhrnn iim l to the skmer logic outside the numeric 
10 processing module, birt also has ndaor cydaa (or dock "be^ at a fl^equency 
^^^^mw^TitNe to the mailmum dock rate wUcfa the celnilRticm imlts (e.g. ALU and 
mult^iBer) can aoeepi. Agdiu this is a OiOf smart dodk, which performs 
same s ig n l flran t logic o p era l l ona in between the qratem dock cydee. 

the rwlndatinn vadta preferafa^ hacve ECL internal 
10 biA TTL hitetteesL 

TWs an o r a tk m oC * tiiu'lin^s is advantagsoua^ net in 
the nwnrinnnii p ossi h ls speed from a given set oC *'*»^ifr*VTn unit% b 

lata traaate path acrooe a very severe dock 
the numeric prnnnsdi^ module and the renwinder of 
20 the ioraten*). 

or courses the a d vwnte g B e of the innovative subtystem enaUed by the 
oa are ato ad vanta ges te the ooo^uter system aa a wholsu 
of the hmovatlve teadik«i est teth hereki can also be 
to renqwUing ^jfitaaaa wfaidi do not eontaitt a sepante subsystem fike 
26 that shown m F^nre 1. La. to nm 
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TbD praent iiMiiftin wffl be described with refereoee to the m^w n nwu iying 
dra w iug ^ which ebow iaportanfc aaaaple embodfanente of the inventioo end which are 
5 iDcorporttted in the sp e cifl c etl o ci hereof by refiBrence» wherehi: 

Figure lebowsesefieralovervlewof amnnerk aee^eretor sub^yttem having a 
QOfvel three proceeMf architeeture. 

Figure 2A ganeral|y ahowa the orgmritatinn of naa kegr parU of the Cootrol 
Proeewor module 110^ hi the preeent^f preferred eaahodhaftent Figure 2B acfaemaiieaQsr 
10 abowa the field allocatione in the miu r oinetnK tioo feraal used b& the Control rioumui 
module 110, ^ the preaent^ prefe rre d enih< M »uwnt > 

FSgnre flA gsneralltf abows the orgBadsattoo of some key porta of the Data 
Transfer Ttm iisnnr moduie, in the preoent^ p referred — >t FIgnre SB abowa 
yeater detail of the loi^ used to adectabtf driwe a conatant a rtifce se onto the a eg n eneer 
15 buaSlAin thelkdalYanateProoeeaarmodules* Figure 8C adKematieaQf abowa the Add 
ftlf#*o>#i#mi in the ndcrofaifltnwtioo format used hi the Data T^anafer Pi o cea e m module 
laoii in the pr eae nt< f preferred e mb i Hli iii w i t 

FIgira 4a 4G; and «D generaQf abow the orga^satioa of soa^ 

uf tlMi mik laimnsiiHl iiifnlnln IttOi iihirh In thn lanannttr irnfnrrntl nntmiTfmrrrnt iff 

20 a Floatfav-Pofaii Proeaasor. F%nre 4A dmwa eoaae hof parts of the iotartee to the 
Gontroi Proeeaaor module 110. Figure 4BahowaaoeaelDegr parte of the data path hi the 
rttiaUim, FcihH rr ocfinan r. In tha xan a nntt ir la n fhrr nil i t ml ini W mrn t FIgnre 4C ahowa aome 
hay parte of the oontroi kgie in the Floathig-Point Prooeaaor, hi the preaentif preferred 
^ if i >.>iii..^> i Figure 4D a phema i Hpalb r ahowa the field ailocetiopa in the mi ur<w nat r < r tkm 
25 femwt used hi the Numaria Ptoceaaor module 130» In the preaef^ p r efe rred 

Figure 6 genarailf ibowa tha orgamntkn of aoone hay parte of the Data Cache 

FWm 6 ganerd^f ^lowa the organteatkn of aooia parte of tha Hoat hiterfi^ 
30 Logic; hi the preaant^ preferred enboffiaaenL 

Figne 7 generaQf ahowa the crga n l iatinn of aoaae kagr ptfto of the Date Pipe 



1304509 



Ihtefftee Logics in the praon^f pretend i 

Flyge 6 genawPf afaow the arginiteHon cf eome key perU of the GIP Interfile 
LopcT tft tbe pvesent^ pcelisfTed wnhodimwnti 

Figore 6A alums a general onrervlew of eoumerk ■'*'*\Tn*'*> ^nhayiam hfM*ttM«»ig 
an e^ttraUnn-fimUwnltad numerie pmreeting module Calgortttim aoorimiorO 180'. 
Figiire 9B ■rhemiitirwHy atwrnm bow the eichitecture of one ample an algorithm 
130* diSim from that of a general-purpoae floeting-point module 130. 
Figure 10 ahowa a aubaifetem tnrhidhig mtdtiple numtfie proffaiiliig au^ 



10 Figure U ganenl|jr ahowa the or^udutiQn of aome kej perta of the Integer 

Proeeeeor Unit, wfafch la part of the cootrol proceaaog (and of the datn-tranate rroceaeoi) 
in the prg a wnH y preferred embi wliniftnt 

Figure 12 general^ ahows the oryuiisatkm of aone kay parte of the Addreaa 
generatoip, iHiidi ie part of the oooEtrol prooeaacr m the presant^ preferred i 
15 Figure 13 genarally ahows the orgMriMtfawi of aome key parts of the 1 

wfaich is part of the eontrol pi nfwaanr (and else of the data tranafer prooeaeor) in the 
preeeni^ preferred ituitwwIinifTiiL 

Figure 14A achematica^ ahows the hardware usecL i& the preaent^ preferred 
^K^uip^w to permit a 16-tit a ddrwa a gBosmtor (or other kwr-r eaehit i nn au hp ro ce eeor ) 
20 to be used ki a sa-bit qratem. Figure 14B ahows the faipiits used in the m^ensA 
of the hardwar e use^ ki the preaenrtr prefeired emhodimect, touae 
<teta soorees ki a Uifi-qpaed qrstem. 
Figure 16 adiematies% diows the interfeoe be t w een the control pr o ceaakig 
module and the data tranafer proeeaaing mM«rf^, in the preaant)y prefer red ambo^ment. 
26 Figure 16 generaQf allows the orgsniaaftioo of aooie key parU of the prknaiy 

f within the fl**nt in^|***'ff ^ prooeesor In the present^ preferred 



F%ure 17 afaowa the ined withki the floatk^point proceasor ki the preeo^ 
nboiSmenty to reduce the eetiqi tone fer unre^atered microcode faita. 
30 Figoe 16 shows bow a eo n re otk mal double buffer is organised and controllpd bi 
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Figure 19 ahowB anothap conwiintteniJ method for double buffaripi^ iidiere a du^ 
port regieter file is ueed wtth one of tlie bite oootroOed extermdlsf • 

F^pire 20 wrhMnirtkeHy shows how the bmovatbre double buflbr of the present^ 
p rcfefted embodtaneot is orgsidsed and eootraOed to scrflware, to provide multiple 
5 opdooal Mcees modee> 

Figure 21 sebemstlcttQf shows the \o0c used, ia the present^ preferred 
WMitwwIimepti for dste treoafer acroes e dock bouudaiy between the holding registers^ 
idkidk hiterfiM to the 2S6*bit wkle eecte bu% and the Bolster FiH 
wide. 

10 Figure 2S riiows a state <&agram of the hw is Wyitii ig logic used, in the present^ 

preferred emboifiment, to provide intflrfodng bet we ea the CP module 110 and the FP 
module ISO. 

Figure 23 frhmwtimljy shows the control dnffnttinfis ueed» in the present^ 
preferred embodiment, to sel ect ^wm^*^ nosiltiple FPs and^or miiltipie algorithm 
16 aMBleratora^ in a qratem Eke that idiown In Figures 9 or la 

Figures 24, 2fi^ wd 26 show the arcfaiteeture of the data interfeces to the cache 

Figure 27 srhemaHra Uy afaowa the hardware cc nU g uratk m used* in the present^ 
preferred * 10 effident control of microcode transfer and loading In e 

20 serWkkop whiehfarterfeeee to the writable control stor^ of aevemldevieea. 

Figure 28 sdwmatieaqf shows the serlai loop con fi g urati on usedl in the presentbr 
^ mti^mA ^iiii^tiMiii^ife ^ topemit mirrofode loaifing toaqy one of several processor^ or 
to some groups of prooessor& 

Figure 29 ""V-^****^ shows the logle used» in the present^ pi e f eiie d 
25 ^TiMinl, to |^frii4^ either serial or peffsIM write into the control store of a mimwic 

proeessor fa a multi-processor qyetcm. 

Figm 80 sdbenaticaqy shows the mlpmnode openttlwi used fa the present^ 
preferred —"^ to provide amlllw^y branddog without address t w w i iwia iy 

80 Figure 81 s rhrmstk a l^ f almws a method of runalpg a discrete Fourier transform 

a|gorithm> 



/ 

/ 



1304509 



Patent Apitotfap of DttIN«tlH«rfa^^ EbCBJBI 

Figure 92 ahowt a method of running a histognai algorithai* in hardware Uka 
that shown in Figure Id. 

Figure 33 ahows a method of runnhig a pipelined algorithm, in hardware whidi 
inffhtrfffi a software^ontroUed double buffer like that ahown in figure 20. 
6 Figures 3< 39* 36^ and 37 •chematioaQsr thow configurationa of multiple 

aub^ystema like that of Figure 1, each of which indudea a daU pipe Interface like that 
ahown In Figure 7. 

''Figure 38A general^ ahowa the preferred physical layout of the nsaln board, and 
Figure 388 general!^ abows the preferred i^qrsical layout of a daughter 
10 onto the board of Figure 38^ The board of Figure 38B cntntalna key compcnenta of FP 
module 130. The two boards together provide a complete ^jratem like that ahown in 
Figure L 

Figure 39 alMwa the preferred embodim^ of tlie stack relator in the floating^ 
point proeesoor module 130* 
15 figure 40A ahowa aooe aui^ort logie which ia used, in the present^ preferred 

emhodimmt , with the sequencer in the eootrel pr oce aaor module 110 (and in the data 
tranafer module 120). Figure 40B wrhrmatim Hf ahowa a nderoinstnictjoa fWHpimre 
vriierein an intemipi occurs String a muMway branch operatiu& 

Figure 41 achematiQaQf alMws a computer qrstem fnrtnding a host computer, a 
20 pktore p rooesscr stAaystem, and at least two numeric s oc a fer a t o r aid«ystema> Bnked by 
a main bus and two bandwidth hechplape btiaaeai 

Figure 42 achemwtinaHy ahowa the flow of stepa^ In a lystem Bke that shown in 
Figure 1, to muitip^ two arrays together (on an element by dement basia) and depoait 
the results fat a third array. « 
23 Figure 48 ahows a aan^ile flyatem which indudes a h^Mpeed cache ei^ianakm 

TT**"K*Ty on the aame y^tj wide data bus as one or more numeric proceaaing moduleai 

Figures 44A, 44B^ and 44C scfaematicaltf IndlcBie the programming env^^ 
of the CP, DTP, and FP modules req^ecthr^. 

Figure 45 ahovm logk for substitiiting the contents of an inatructioa register Ibr 
30- a field of microcode from control stwes* 
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Figure 46 shows how word address odd/even structure results from the double*word 
transfer operations. 



Figure 47 shows the timing structure used for how word address odd/even structure 
results from the double-word transfer (^>erations. 



]ffinfiP'*TK>W ftF TUB rKBW¥*"F" wwnnnnraiiTH 
THhib mtmefous mnovakivo t**^*^lf*y ths pc es c o t ftppiiwtloo will b6 descrilMd 
with pttftfcular refacnce to the prc«MiUy p c ef crted e mtM w lii ii wit ^ irtier^ these innovative 
t eaching i ate edvantegeouajy epphed to the pertieuto 

5 work under the directioQ of a hoet cemputer to handle htf> »p eed mimerio coniputfaig. 
(Such sub^^eteine are eonmoo^ referred to aa ' accel e r ator boa r da^*) Howevett i)t i h ould 
be understood thai thb embodiment ia ontf ooe eaample of the naaj advantaffBoua oaeo 
of the bmovatWe teaehingB herein. For enmplc^ the varioua ^rpea of the archltecturai 
i n nofv atkma ^Mloaed herein ohi optiMuOtr be adapted to a wide variety of oenvufter 

10 agratem conteaita. la seneral atatemento made in the wpe i ilVat l nn of the preeent 

*9- - Jtn n nt — ,1»H„,1* - - - ■ AWa ■■■^niiM mlm tarn* m il ti^ii^iiinriM %#tfM^tfWS^» 

lypiiBBtiOQ QO no* neoeaaanqr neiiinii eBQr oc toe va w oua cmmea iuveuiM»n> jureoveri 
aome ataiententa may ^VS^ ^ aone inventive feeAurea but no4 to others* 

15 The pr se ent fatventioo wifl be deacribed with pertietdar re f erence to the cocte«t 

of afltyatememboAmeni Eke that shown in Figure 1 (or, alteraaiive^* tboae of Flptrea 
OA. 1Q» 41, or 480 U should be understood thai the feeturee of theee embodtmonU are 
mi an mil nsearj pnrtti nf fhs prnsnnt Inrrntlnn, trnt thiy iln p^TnifllTr thr "n^rrrt -^H^ 
the pcetered eaaboameni will bedsaerlbed. 

20 Figure 1 gsoamUr showa an architecture for a numeric prnrsaHfng ^ wtiich 

noROaQf Is uaed as a aub^yvtem of a targsr coa^puter lyatem. abrsteme lilBe that of 
Figure 1 ere coo an o nt r retered to aa "aooalsrator boardiT . Thegr ere notmallf uaed aa 
sub^stcma. That is^ a eiyerviaor procesa or wiM provide a Mrfi^evel cuuuuaud to the 
liixiikirntot uuluialriin F^ eaampis^ the superv is or pr oceaaor may order the acoelefator 

25 aiA^Tetem to psftem a vector add» a msAris inversion, ore ftsiFo«ffior trans 

The n^roliiratnr aubiyatafn wffl then fotch the data from the loeatian specified by the 
a up erv ia or prnftfisanr, peifij s m the numlMS^<rundihig op er e t i onis and return the result to 
the auperviaor proceaaor. 

Figure 1 ahowa aa ardiiteetwe with three dtfteent proceaaor modulss^ ai of 

30 which oaa run Mtismmmw^ unka co o curientt r> Theee three modidea are the oootrol 
proeeaaor (CSP) module 110, the deta tnuMte prooeasor (DTP) mocfaile 12Qi and the 




mimerle p roc ea a ipg module ISO, (TUi numerie p co ceaaiug module is pnterMy a 
fkating-poi&ft pc ocwwing moduK and wiQ therefore often be r efened to ee the *FP" 
module. Various other types of numeric pmmwdn g modules can be used, es will be 
iflimiwied beiow.) Tlie numerie p rocesso r OMdule 130 runs sqmefaronouitr to the other 
6 two processors, Le. with a compkttn^y independent dock. In eddttinn, the external 
Interfbcee 15a» 100^ 170» and 180 also contain substantial amounts oTlogftiL 

The stracture or the data cache meMty 140, Md its relstfen to the other blo^ 
in the system, is q;uite s i giiltnant . The data cache memory 140 Is connected to the 
floatiiv point p roe ea sor 130 tigr ^ «^ bus 1^ In the present^ pre f erre d 
10 wnhodimmit, thecaAebns 144 jnchaies 266 physieal Hnea resenred for data. 

The three ^rpea of processor modules permit eaay task allocation. The primsry 
altnmtinn of tasks la as fbOows: 

the data transfer pioccwur mniafes the interteDs to the outside world, 
through the external interfittes^ and siso handles dsta transto between the cache memory 
15 and the outside worlt 

the eonlrol processor 110 performs addrees calculations, and rontrob all 
date transfers to end flram the numeik p r nrffiswing modide 130; and 

the numerie piuceaslp g module 130 performs data ca l ctdat l nn s. 
nesignfng an efficient hi tf>-speed ^vtem to support tUs aDoeatfaii of tasks 
20 re<|iilreo that soaae aichitectural difWniltiss be solved. However, tiie <fi8cloaed 

iimovaliops sohre these , and the result turns out to be snrprisinglbr 

TV» fariBtate rfaHtatinn of sudi an architecture, the wnbodiment of Figure 1 
^^^Mi***"* several notable hardware foaturea. First, the control piotes s o r 110 includes a 
25 very large capabOitgr for artdrwss <tilnihttion op erat ions In the present^sr p re fe r r ed 
mmwikMMiiHimwkL^ ^ fgBOitnl^ sluiwu In Flguro 2, this processor indudes not on^ a 
ssquttmor, but also address generatfan logic mid an arithmetle^otfe-unit (ALU). 

The <totatr«srf^ processor 120 siyervisBstlm operation of the eaternalinterfeoe 
contrnlfirs In the pr ess nt^ f preferred embodiment , there are aetuaqf three eatcmal 
80 interfiiBe co ntrol ler B > llieae Include a VMB bus h^ofeoe 1001 and also controllers for two 
badKpbmebuasea. (One baekpkne bus is a "data p^^" which provides a h(^i-bandwldA 
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link batwen aooderatov, and tha other la a ^JP bua," whkfa is oprtmlteH for 
tnnasiiMloa oC image or papfaica data.) Each of these three bua inter&ces Inehadee ita 
own control logtei preferab^ inchiding a caitroUer. For etampK the VME bua interthce 
tndudea a direct-mamofy^-aooeaa (DMA) controOerp for eipaditad block data transfer. 
6 Horwewer, thftdaUtranaferprooeaaor mprovideaahish^^ 
interlhcea* 

A oritkal part of thia architecture ia the cacfaeflMnoiy 140. This cache memofy 
ia not oi4f veiy wide (206 bats), hvge (preteab|:r ^ teaat 2 megabytea), mi fiut (100 
nannaermirtw aoceaa time aa preacntbr ooofiguredt and preteab^ much &8ter)» but ia also 

10 effective^ tri-ported. The memofy ia preferafa^ on^ dual ported plqralBal|f » and 
arbitration between the oontrol proccaaor 110 aid the data transfer procc aau r 120 ia 
aooofloplis^ted in their nucroooded InatmctioiA achemau 

Note also that the three porta of the cache memocy 140 are quite ififferent. Jn 
general, hi moat numeric proeeasing aub^yatema it haa been found that the bandwidth 

15 between the cache memofy and the numbv<runching comp o nents ia of critU 
importance. Th eref ore^ in the p r eaent ly preferred fr»ih <w ilm i rait» the port to numeric 
proccmor 130 b imich wider (and therefore base ntiKh higher bandwidth) than the porta 
to the oontroi prone aaor and data transfer prooeaaor. In the preaentbr pretered 
embodiment, the latter porta are on|f 32 bita wida Moreover, a aet of tuOy parallel 

20 re^atera is used ai the aa-bit porta, ao that aD aceesaea to tbeaa porta are aeen bgr the 
cache memoir 140 aa ftd^ paralW, 1^ aa 2(M4)it par^ raada or 

The hitarfeee to the numeric prooeaaing modida 130 is ao defined that muhiple 
znodufea 130 can be uaed in paraO^ all under the control of a single control pr oc cas or 
110 and all arrBssfi^ (preferafatr) a ali«le data cache module 140. The extreme^ hiffik 

25 baadarldth cT the cadw bua 144 ia u important fitttor in achieving thia multi-module 
capabffll^. 

The tatatfeoe between the eontrol proceaaor 110 and the data transfer proce ss or 
modula 120 providea sigiilfifwnt advanti^ to effletont<y ea^^ Inthe 
preasntly p r efer red embocfiment, aooae aj g i iifli'a ii t featurea are uaed to improve the 
80 advantagea ot thia interaction. First, aa ia Cflmmnn to the art of micropropammed 
proceaaors, both the control processor 110 and the data tranafer procegaor 120 prefoab^ 
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use variable-duration instructions. That is, some instruction types require substantially longer 

cycle times than others. For example* to give extreme cases, a no-operation instruction or an 

unconditional branch would require far less processor time than a multiply instruction. Thus, 
5 it has been common to use variable-duration clocks for controlling processors, where the clock 

generator looks at the instruction type being executed and adjusts the duration of the clock 

interval accordingly, on the fly. 

In the presently preferred embodiment, both the control processor 1 10 and the data* 

transfer processor 120 are clocked by a shared variable-duration clodk. Thus, the control 
10 processor 1 10 and the data tranter processor 120 are enabled to run synchronously, even though 

they are coixmrrently running sq>arate streams of instructions. 

The control processor 1 10 is given prirority on access to the cache memory 140. That 

is, the data transfer processor 120 musi check before every cache access, to ensure that cache 

access has not been preempted by the control process 1 10. However, to prevent lock-out, the 
15 data transfer processor 120 has an interrupt signal available to it, which will command the 

control processor 1 10 lo release control of the cache pott for at least one cycle. 

The three types of processor modules will sometimes be referred to by abbreviations in 

the following text. F6r example, the microcode which runs in the data transfer processor module 

120 may be referred to as the DTP microcode. Similariy, the microcode which runs in the 
20 contrcd processcM* 1 10 may be refmed to as the CP microcode, and the microcode which runs 

in the numeric processing module 130 may be referred to as FP microcode. These abbreviations 

will be used regarding other features as well. 

Desipi Goals 

25 The subsystem of the presmtly preferred embodiment has been deigned to give a very 

high floating pcMnt number ovnching performance with small size and at low cost. 

The system contexts have been targets for use of this subsystem: this subsystem is well 
suited for use as a floating point accelerator for a wide range of general-purpose host computers. 
(In particular, compatibility with UNIX™ engines is desirable). 

30 
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It is ateo contemplated that Che aooekraior qyatttn of Figure 1 may be veiy 
adventagoous in a specialized pkture piwffwJn g qrstem. An example of such a aystem 
would be a graphics and image prooeaaing flyatem* oianufiKtured fay beachMark 
Technologies Ltd^ and r e fe rred to as the "QIP" ^fstem. CHm GJP ayatems inchide a 
6 number of features to give very high throughput in a wide range of ^rapfaks and image 
appBcaUopa.) Such a ayatenu inrlwting an aaccliirator aubqrstem like tboae shown in 
Figures 1, OA, 10» 43* ete^ may be partieular|sr advantageous for running three- 
dtmes^siooal ^^^^^pfaics algorithms* 

The architecture oT Figure 1 wiH be iBaniawfid h& much geate r detafl beiow, but 
first it win be informative to look at how this muhiprooenor structure can be uaed. 

As noted above^ moat algorithms can be broken down nito four separate parts 
Control, Data input and output, Addreas calculatiotia, and Data caleidatioQS. 

15 The p r efer red architecture treata theee ae aqwrate taaka, and aaapa them onto 

the three procaaaoraL The control and addreaa ealeulatlans are handled by the Control 
Proceasor (CP) Module 1I0» the data VO tasks are handled by the Data Trtmafcr 
Processor CinP)Mbdrfe 120, and the data cnkriitotkwa are handM 
' Proceasor (FP) Bfodule 130. 

20 The <fiviBioa of an algorithm between the control processor module 110 and the 

FP k iBiMtrated by the detailed dasc riptfans below, regardSing aome apedfle algorithm 
UnpiemantatifinaL One good enmpla to provided by the Faat Fourier Trmoatorm (FFT) 
impkmentation discussed briow, with referenoe to Figure 31. Use FPT algorithm is 
notorioua^ '^smwii^ to p roy am efficient^. 

25 In tlili ^""tH the FFT algorithm ia divided be t we en the control proceasor 

motfade 110 and Ikating^oint p ro cea sor module 130^ by ftwrigniii gtto 
for ttM> 4^ nsmplna and phase coeffidenta to the control p ro ces sor module 110 and the 
butterQsr calndataww to the flotfkigiNihit p roc ea sor modula 130. 

The portioik of the FFT software wUch runs hi the CP module 110 ral m ii wt** the 

30 address oT the eomplais data» aa a Iteetkm of the stage and hutterlhr number*. The 
coa^lex phase ff? H! W«r^ are hM hi a table, and thus part of the software wffl alao 
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cfilculBto titifc p^fflit i fm of i2io f w m ift d ''■'>t f f V '^* iit# In tho tafate^ as a ftukctkn of stago 
and butterQ^ mtmben. Oooe the addresses have beea relaitoted, the data and 
ooefiOdents caa be fetched aad tranaforred ever to the floatliis-pohit proeeastf module 
130. When the lloatlng-pQint pcoeeasor module ISO has completed the butter0sf 

6 caleulatkos, the control paroce aaor module 110 will read the results and save them before 
repeating the address cakulatloQs for the next butter^. Note that the contrai prooessOT 
module 110 doesoot have to track the actual buttorQsrcaleulattof^ It mere^ interchanges 
data with the floatingiKiint processor module IdO at ^ l yhHwiiitat l nn points. Hote also 
thai this software does not merelf cakulate addresses^ but also oootrcls the actual data 

10 traniArs between the cache memocy «id the wmeiic p roc es sor . 

The pofftkm of the TFT software ^Hikfa runs ia the floatingpoint p?*^"""^ 
module ISO, cafcmlatoe the fautterq^ bgr a simpls finear sequence of IreHnrt iu n a to 

nothing of the compOeated address caleulatkma needed to provide the correct data and 
16 coefficients for each stage. The code for the data cakulBtions can therefore be written 
without refiBrence to the code for the data tranaforoperaftiona. la &et« if it is desired to 
use a dUfoieut desipK for floating'point p ro cca sor module 130 isA, to use a dUfo te ni 
flostiiv point di^ set, or a kiw4evd data path ardiltecture wfaidft is men 
FFMt then on^ this (retetlve^ simple) portion of tbs software wOl require changhig. 
20 The eapBCUtion of the CP aad PP software oceure in paraMel, and is piprt i ned so 

thai, the speed at vdddi an algorithm runs is determined by the slowest par^ 

Architecture DeecrinCifln 

Some of the kegr parts of the subqrstem of Figure 1 win now be described in 
26 greater detalL However, it rfwuld be understood that this is still on^ a summary 
aeecnptioiL rar greater oetaa wui oe pcovioca oeiow* 

The Control ProceMwr (CP) module 110 indudes a 32 bit integer processor unit 
SO CIPU)240, a mkrocode sequencer 210. an address generetcr (AO) 230, end nuKeOuMMui 
Items such m mlcr oproy a m memocy. dock generator, bus control etc. 
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In thfi present^ (o^efemdembodinkent the mftager proceaav unit 240 is a Weifcek 
XL8I37. the aequeneor 210 ii an Analog Devtea AI>SPa401. and the addnaa generator 
230 is an Analog Devices ADSP-Uia As wffi be nw% understood fay those skilled in 
the art, a wide variety of other eonkponents oould be used instead, or equWalent 
ftmctloaality could be frmh^tiH in other bkxks instead 

The eontnt p roee ss or module 110 hss two main taaks to undertake: 

It cotttrols the ofMrs^te of the bosrd (St a higher levd), fay Interpr^^ 
ftan the host, requesthig tranafora fay the DTP module 120, and fnitiaWiIng 
the floating-point processor module 130 before it starts dsU cslniht k m s. 

It gomtes addresses for the data cadie memoty, tfid controls the 
transfer and routing of data between the data eadie memory and the FP nmdule 130. 
TUs aetivi^ atomaQy oooirs repeated^ during the actual number crunefafaig process, 
after the hi^ level oontrol operatkms have been completed. Loop control is handled fay 
the seaueneer, so that the addrees generator and IPU can be used toAteMj for 



with other bloclDi is via a 32 bit wide d^ bus (CD fans 112), 
idiicfa aOows the control ptocciMor module UO to read and write to the data cache 
menKwy 140, oomnusid memoiy lOa and the control registers of FPmodide m The 
control processor module UO can be interrupted fay the hoot (via the VMS interfoce idO), 
fay the fkiattog-point processor module 130, or by the data transfer processor ^ 
In normri uperalinn propam d ev el opme nt mod debugging the ontr 

interrupt souroe wia be the date transfer proceeeor module 120. 



Tlrfrf Ri»rf«> o r OTP Module 120 (F1g;3A) 
The Date Transfer Processor (WP) Module 120 is very similar to the control 
lUK ttom the propammer^s viewpoint, in that it usee the same 32 faH 
nieocer. The bus control and intertee control are dfafviousbr Afferent. 
One other dMwdifahig feature from the control processor module 110 is thst the date 
^f^w^mtmm ptuinnfy mft ^* iW 120 has a iniTr*r"^ ^"T"*^'^ wlikh permits it to 
control addon boards Cracfa as a buOL memory card or a network card). 

The (fata trvnefer processor module 120 las three main tasks to undertake: 
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It ocntrob th» tnuttte of data beti> w > the data cache memoty and the 
eztenud mtet&oes. dt does thia in rea p ona e to hish-level fommanda from the control 
proeeaaor module 110 (or from the hoet).) 

It tranafera cnmmaiwto from the external tnterfawiw to the command 
5 queuea maintmned in the command memoqr 190i for aiihaMiuant pro cearing ty the 
coatrol proceaaor module lia Anor of the eaBtemal Interfrttaa can profiide commanda, but 
initiaQf iiia expected tiMit the VMS interflmewia be the mahi aouree. Suitable aoliware 
win a&ow BaU to be h^ in the date cache memoqr (ot mrnmand memocy), 

nd be called m mneroo. CMm method ia aiwnrtinipa c^led Neetor chaining.')) 
10 In the debug eondronment, the data traufer proceaaor module 120 ia the 

inter&ee between the debug nmnitcr (runnmg on the hoaO and the mi^^ 

. in the daetn tmna&r p to cea aor module 120, coatrol p roce aa or module 110 or 
flo ntl i^ p o in fc pro cea aor module 180. It rfao i^vea the debug monitor ac c eea to the varioua 
memoriea that are not mapped into the VMB addreaa apnce. 
16 Tim tranrfer of data and fomrnfind a be tw e en the external totcrtoa, the data 

cadm memocy, command memory, VMB interfine memory, and the date tranifiBr 
prooeaaor module 120 oocura over the 02 bit widB TD bna 122. The external interfrnea 
150, 160, and 170 are FIFO bufflared, and interrupt the data tranafer p roc e aaor module 
120 when they require atteptlofi, tfi, when they receive some data or are gettfaig empty. 
20 Addttiooal Inte rr xy t aoiBoea are the boat (via the VMB interfiaee), and the oootrol 
110. 

I liy the data tmnate proceaaor module 120 to the data cache m em o ry ia 
Ihnited to cytethnt are not imed by the oootrol proceaaor module 110. CTheCPmodule 
110 mi^ be using thamenioryeltbir for tranafera to the floating-point pr o c eaaor module 
130 or for itaeH) If the data tranafer proceaaor module 120 ia forced to wait too kog for 
it CHI ateal n <yaa by hiternvthig the oootrol proceaaor module 110. 



MrfR«>i«r of fp m ^ 

The Floatii^potot Proeeeaor Module 130 ia located oo a aeparate boardr ^iHdeh 
30 piiv into the maia baae bonrd. The op eratlona of the floating-point ] 
ISO may be ^^Ttiiri*^**"^ aa hacving two diitinguhiiwihle 
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(a) The mk roco dc d floating point unit. TUa section undertakes the 
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floating point caleulatiQiKa. The unit waa deaiffoed to achieve ooe goal - to nm aa fhat as 
poeaiWe, m order to obtidn miwrifmim perfonnanoe from the floatfaig point hardware 
devioeaL To meet theae deeigii aiDia» a veiy ahnple architecture la utilised. It inchidea 
a floating point mtiHipHer, a floathig point ALU (artthmetle and loglo unit)» fhat aouUiport 
regteter fileoi and a imy ftat» but aimple^ M([uencer. In additinn, a scratchpad memoty 
la doaelf coupled to the inner daUpaths^ to hold lookup tablea and provide histogram 
storage. The floating point arithmetk unite inierfiM with the relator filee via two r 
porta and one write port* Another write port is connected to ooa of the read porte* to 
provide a data ■huflle and repttcotioii capafaililgr. Ibe final port 'm U&ectkoai and ia 
uaed to pass data into and out of the register filea. 

(b) The data coche OMmory intef&co. Thia part of the FP module 
interfiles d^ cache memory to the bi&rectioQal port of the regoter filea. There is a 
set of hk Hi ec U o nal roffstera b et w atn the regieter file and the data cache memory which 
|il|iiillnne the data traoitea uid also handlea the. data Hi i dUp i ej i l pg and routing: Tbm. 
cootrcl fiv the tronrfer ia genmted hk the tranate logie. Note that many parte of this 
iiiterfiKe^altlmi«hphyaiea«7 k)eatad together with the FP module 130, are doded with 
the CP module 110» and wil generally be referred to aa an ertenainn CP module 110 
n^ber than as port of the FP module 130. 

A hV4r multi-ported fest register file is a k^y dement hi provkSng a dean 
hiterfeoe between the contrd pr oceeaor module 110 and Ooatfaig'point pr o ce ssor module 
laa One aide of this reg ist er file runs yis l i n s www <y to the contrd pr on Mo r module 
110» and the other akto runs ^ynchronoue^ to the floating point p ro cca s oi module 130. 
Tlms^ this do^ boimdBiy pkiiw^ permita diangea to be mode on one side of the 
bouttdsiy without aflbetkig the other side. This provides a migration path to fl»ter. or 
more^ integrated Ooatfav point di^p seta, and henee floating point device independence. 

Up to 4 floatfaigi»oiBt proceeaor modulea 130 (or algDrithm-eaBtondaed modoies 
1300 oan be included in one such suberyston. Scene aiiunp l wi of interest are diown in 
Ffgnrea SA and 10. 
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Tte Data Cadsd Blemory 140 Is a veiy higlft bandwidth, muhi^orted mamocy. 
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Tbm arddtacture of this meoiofy and lU intar&ees p ro vi des transndoua advantagaa in the 
overall pcrf or ni anea of the system of the preferred embodiaent. Tha high bandwidth Is 
necesHM y to keep tha tloathig-point proceiioc module ISO mippUed with data (and to 
remova Ita resuha), ^i^ian tha floatinnwiiil p fo cca aor modiiXe 130 ia undertaldiif rioiple 
vector eakulatkoa. For ermnpy^, a vector 'addT op eraU on reqpiires 8 number tranafers 
per cakulation; if tha OoatinffiMint proeeaaor module 180 ia able to austate a ralnilatton 
rata of 20 MOopo^ the memory bandwidth required to keep up w9l be 240 Mbgrtea per 

TOa data cache a tem oiy haa a m emory bank made up of 64K by 32 bit a >emo gy 
modutoa^ p r wkHu g 2 Mbgrtea of oa-board storage. TliiB may be umpanded by tha «tae of 
a remote memory fnpanwimi board 4310 wta^ haagi onto tha cache bus 144. 
(PhyaieaQf , thia m emory arpanninn module phigs into the same ootmactora aa tha 
floatki^potot pr oe e aaor m odul e X30 moduleai) Tfala mamoiy C' sp an aiMi boards which will 
hanra tha aama bandwidth aa the on-faoard data eadie noMmory* can be configared to store 
an extra 12 Bfbjtaa of memory hi incremenU of 2 Mfaytaa. using double caped^ 
m emory m^'f**, the oo-boord atorage may be InfTfaaed to 4 Mbytes and the off4>OBrd 
to24 Mbytes. 

Thcra ate three porta to the data cache memory, one to each of tha proceeaora. 
H u w em» ht BMwy reapecta has been posaibla to treat the mecaoey aa on|y dual ported^ 
beeauae the data tiaotfaia to tha control pro eea aor module 110 and floeting-point 
pfocnaanr module(s) 180 are ai controllwl by tha CP mi crocode. Data toaurfus for tha 
Ooatingiwint procsssor module 180 and control p ro cea ser module 110 have priori^ over 
VO traMto i ^ so ths dsla tranote proccasor module 120 may be forced to wmt until 
there ia a firaa mamoay ^da. If the data trenafer proeeaaor Okodule 120 ia kept waiting 
too Im^ ft can Internet the cooM pro ce e a or module 110 and gain aeeeaa to the 
mBamy. Wa is not ISktIfy to ba a problem, unless the control p roe eaa or module 110 is 
mdartakhy rmMkm acceaeea Bven then* Ibr blo^ I/O transfers* tha data transfer 
pro ces sor module 120 wi& reqafaras 8 ^dss to transfer the data per memory aocear^ 
befera it needa to request another block of data. 

In order to obtsin ths Vgh memory bandwidth with reascmable cyda thna 
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memoiy devioei^ a wHle menoiy ardiit^^ Tba memcnry it 256 bita 

wide» 80 that in a siugla aoceaa cyck, 32 fa^ytea (8 P.worda) are tranaferred. With the 
memoffy cycUng in periods of 100 nsi the memory bandwidth ia 320 ICfaytee per Mco&d 
for hlodt tranafara and 40 BCfaytea per aeoood for random F.word acBeaaea. 

5 The data cache memory may alao be used to hold microco d e overlays for the FP 

module 130. These can be tra nafer red into and out cf the FP module'a writable control 
atore (WGS) when the Ooatinrpoint prooeasor module 130 mkroeode eioeeda the WGS 
alsew The re-loading ot the WC8 via this parallel load fheOHy occurs vioy aauch fiuter 
than the ttflcmal soM load under host ooitrQl. In i)Kt» thla capability ia fiat enoi^ to 

10 aPoar djoaaic paging ot the micr o c od e . 

The Command BI«Qcy (Cld) 190 Is a flOMn amount 
dual ported between the control prooaaaor module UO and data tranafer proceeaor modwlft 
16 lao. Coaamami, coartrol and atatua data are paa aed be t W W n the control p ro cca aormoA 
110 and OTP via aoftware <iueues or FIFOs nwrfntwinM In this memory. 

Half ofthk memoty a reserved for use by thfi microcode debug monitor, to bold 
the control p rooe aaor module 110 and £loatii«iioint processor module 130 state 
mformatta (as well aa aoma mmmand atructurea}. 

20 Brief Bevlt» «^Bit^mal hiterfacea 

Thit prttP*! riHl ^m^WMf h^^tiwi^ mmmmtmi majt int*»fart^w. The nsost important 
of these is the host inter&oe 160 (alao referred to aa the VMS Interfile). The VME 
intertee faiter&ees the si^b^rstam oT the preferred embodiment to the VBffi bus and 
rnnpl in s with the ftal electrical and protocol sp rriflrat i ona as defined ra the VB€B bus 

26 apedfleatlonw rvMaa CI. 

Tbe VlIB hitarfi»e operates fa» alanre mode idien the VBfE hoet is kaiffing 19 
microcode, srrnaatng control or atatoa reglrtera» ai r iMlug the VBCS Interfeoe Memory 
(VIM) or acr fiaafag the data FIFO. Tlie alave intflffeee does not siqipart byte or word 
accemer. It aupporto on^ 82 bit peraU accaaaea. However, the control and atatua 

30 regvters are IB bits wide^ and therefore a 16 bit host can still control the sub^yston of 
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The VMS iiiler&ee qperates in master oiode when it ia trsnsferriag data be tw e eu 
the data FEFO and VHE memoiy under local DMA control The DMA aetivi^ is 
controOed and monitored hf the data tranifor prooenor module 120 which can alao 
6 butiate intomipt qrdee onto the VMS bua* 

Ihe Data Pipe tntetfitte la deai^ied to eoonact to a bi^k-handwldth twirkplane 
huai CPhyskaQf » thia can be configured dmp^ uring ribboo cable.) TUa bua pw M dea a 

Mnvmmi^^ wwIwiImh ftw j^jy iwiikmmmmih^tAmm^ <w>«niii%Ua*trt#> ^lat li^ the Inteifhce 

]o|^ Inctudwa two fecelvfaig porta and one aending pcrty ao that aeweial bumea of thia tjype 

10 can be uaed aa abort loeal hmaca, to provide a wide variety of qyatem dataflaw 
archHecturea. The data traaatee on tUa bua are buAted with FIFOa (at the l e o e i w lug 
end)* and thia arctetectnre allowa Ugh apeed. low overhead tranarera. Multiple 
aubqratema can be connected in paraUri or in aeriea in a pip^ne). wUch aDowa vefy 
hi^k perConttanoe AQfatena to be Implfttnented eaa^f* 

15 Aa an #rwnnplis a high p ei fmmau ce, real time 3D g rap hk a ayatem can be 

conatrueted with two aooeienitor aub^etene and a picture proceaaor» roinflgured in a 
P^Milne* TbiB firat accelefatof aub^fateni tranafbrma and cB^pa the po^gooa fbr fraoBe 
the BBCond atcetoiator aorta the p o ^ ygD■^e into dra w hig order (for hidden aurftoe removaD 
te frame n-1, and the picture pr o ce aao r drawa the po^^gona fbr frame nr2. 

20 The DIP mirrofode interface ia virtual|sf an ertfwwinn of the DflP 

module 120 micro addreea and data bueaea. U ia 100% rwnpatihk (plqMcal(f and 
dactrieaQf) with the GIP microcode e i t i amii ia t bua, and can uae anar of the ftipaniiinn 
carda, dfrignml for dP, that uae thia type of interface. Hie external bulk me mory 
iQratema and network carda wi& coonact to the aulagratem of the pt e feu ed em botflmwit 

26 -^da thia interface port* 

A Pietore Data Bua Itrterface 170 (or «G1P InterfaceT) oonnecta to another bua 
wiiieh ia partjeuhrly optiirriw^ gbr gaphiea and fanage data. Thia interboe alao permita 
cofwwrtion to the GIP microcode erpaniiinn bua, which aiowa a amafl amount of Interface 
logis on the aub^fatem of the prefarred emh^n!lnient to be controlled by the GIP 

30 mfc roco d a. Thia provfidea a bidtectiana^ 16 bit wide FIFO between the GIP and 
eulMQFBteA of the pr e feri ed emhcwilment along which fwiHiMUda and data can travel. 
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Each ride ot the intofhee can intemipt tha other. 

Copteol Pfqc— nr (CP) jflTlhllTT lift 

The control pfoceaa or b « 32 bH mieroeoded proeeeaor baead around a 82 trit 
6 Xnteser Proccflsar Unit (IPU) 240> which hi the iimeiit)^ praterw! embodfanent la a 
Weitek XUiar Tlia IPU 240 la suRMirtad by a 16 Ui addreM gmeraftor (AO) 230 
(wUch hi tha praeot^ prefomd emhoiBmmt Is an Analog Devieaa AD8P UIO), and a 
16 ba aequenoer 210 (which hi tha preaent^ preferred eonhttfiment b m Analog Davieea 
ADSP 1401). The main data path witUn the control prnr^nanr la tha CD fai» 112. 
10 Figure 2Aprovklea a general Ofverview of the crguilsatkaor a eofi^ 

lia hk the preeen^sr preferred wnhodhnwit A writahia control etore (WC8> 220 Is a 
memotr whldi contahis a awmmice oT mfc ro h M tru ctioML A sequencer 210 provides 
mkroautraction addreas oonmiands 2U to fetch 

Tlie a*ream of Instroetiotts thus fetched 6«m control store 220 is shown M Note 
16 that both an unregistered ou^ut and an output regMeied through register 222 are 
preferafa^ provided The l eg toi ae d output flrom 222 Is provided to deeoder 260. 
BeglBtcr8 222 and 228 are both oonlllgised aa serial shadow reghters^ and interfeoe to a 
aerial loop 226. Note also that a portion of the mhroiiddreaa stream is also preferab^ 
p r o v i de d on a fine 211A, wtiieh wil be conmuudeated to the Ooaftfa^ pohit module 13a 
20 TUs has advantagss which win be dfaeussed bdow. 

Note also that the flow on fine 221 is preferabff bi&ectknaL That is, this fine 
can not onjy be used to read out nfc r ofast r u rt aons from the writable oontnl storey but 
can also be used, under some d rornns t anpe s, to write histnietions back mto the eontrol 
store 22a This Is an ioqxvtani cm^mbattj, wfakh baa advantages which wffl be <fisciiaoed 
26 below. 

The m i rmrnde on^aifc 221 is provided aa an h^ut to decoder 260. La 
con venti o na l feridon, this deoodsr sepemtes the fieUs of a micninstructiGn and decode 
them as nee d sd, with ndnhnal low level decode logic. The pr<srnt<y p t cfeii ed 
i n i rr oi nslnirt in n format is shown in FIguro 2B; and win be dis cu ssed In yeater detail 
30 htkm. The ou^ota 261 of the deeoder 260 are routed to an of the mqjor ftmctional 
blocks^ fnrhidtng the address generator 230^ the Integer pr o cessin g unit 240 and the 



flequftncef 210, Bftonuae th«ae lines are ao perv aa iv e, thej are not oepatate!;^ afaown. 

Note thai tha sequencer 210 reoeivea inpuU not onfy from the IPU 240 through 
fink register (tranaoover) 214» and from addraaa generator 230 Tsa aaqueneer local bua 
21S, but alao rac alv aa eevwal other inputs: 
5 A varied of intamqit linaa are imiltiplaBed through a muhiplaaar 218» 

and theae intemipu will generate the varloua aheratioiit in the program eounter 
operation of a aeqoeneer 210. Sequencer hardware for handling intorrupta appropriate^ 
ia ytarf w^ known. 

Another muHipleier (afaown aa 212) ia uaed to select among a variety of 

10 mffutttinn code aiipiala^ for inpiA into aaquenoer 210^ Tliese eopJi Uwi code signals are 
used in the kgie of the aeqoeneer 210 in varioua watfSi aa wfil be ftirther discussed bdow. 

A buflSer 217 Is used to route oon st an ts which may hswe bean i pe cifled 
by a fMd of the flderofawtnietkos 22L 

In ari da kw , oooae fUrther inputa and outputs are shown to the writable control 

15 store 220 and microittatruction bua 221. A write enable fine 224 ia esteraaqf controOed* 
sA from a host In additioni a two-waor interChee 211B permita the boat to write or read 
tothemieroa ddfr s s bus211> This cspaMBty ia uaeftd ftr diagioatfcag and alae for writing 
microinstructions into the control store 220, as wiH be dbrus s fid below. 

A docic generator 260 receives cgpcle<4uratlon Inp^ita from both the control 

20 procoMor 110 and the data tnostar procaeaor m. Ibe duratioo of the current dodt 
qrds is selaeted CO the Of, in aooordanoe with the longest duratioa specifier ree^v^ 
the CP and lyiP modules^ This is prelbrahty impl wnmted using a programmed leglB 
amor (PAU. As with decoder 200ip the outputa of the dodk generator 260 are ao 
pel vasive^ routed that they are genera^ sot aepaiata^ abown» 

25 Figure 2B sliowa the r'>*"«*^'>«*"«*^ field anocatioa in the present^ pr e f erre d 

^K.^.fhff*'** Note tlmt tha aBooatiaa of fields in the CP eatenskm logic is idso shown. 
Thrt itpernHpn irf this inttfM^ ^^g^ ^ Mmtmmmt^ in ^t^mt <WMi h^Aam. However, 
at thte potot it ahouM be iwted that the additkmal bits of microinstructi^ 
eatenflka field, and the WCS eiteiMioa which stores these adigtkmal fiekia for each 

30 kartruetloQ kk the printary WG3 220, ami the kgie adikh decodes an^^ 
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130 or algorithm Mxekrtttor ia the subiiystfim. Thua^ the embwtimimt of Figure 10 
would include three WGS crrtrnwlflnn, and the total CP mkrooode £Md would be 192 bita 

Note that aeparate instruefcioo fields in the primery inatnictioo are aOocaled for 
the integer p to ceaain g unit 240 (32 faiU), for the a d dreaa generator 280 ( 10 bito), and aXao 
5 for the aequenoer 210 (7 bita). In the extenaloa OeUa (wfakfa would be stored in eneh 
WC3 extensloa), fielda are allocated fbr register select, condltkm aeSeet, and transfer 
eootroL Tbe use of these bits will be discussed in greater detafl bdow. 

Other instructto fields are allocated In ways wfatdt are Mrbr co nv e nt 
artorndbrooodedarefaiteetureaL For esam|^ a bit ia uaed to ladleate that a breakpoint 
10 baa been wfThrrt, aerend bits are used to brief^ describe the instruction type» two bite 
are used to encode the dock eontrol (to permit tbe variable-duratloo docks* ae rtisruisud 
above), etc 

The addreaa generator 230 is an off-the-ahdf address generator umL The 
calndatioos wliiuh can be perfionned by tUa unit enhwnne the raped address eo BB p Mt a tion 
15 aWHtlfHi of ti>e copttoi p rocc sa or 110. 

Inadittlon, the integer p rooe aain g unit (glT) 240 provides still peaterarithmetip 
The IPU oan read and write from the CD bus 112, and csn also output 
\ onto the CA bus 111 (tlwough the register 241). These a d d r eeaea, as may be 
fat the Ugh tefvd diagram of Figure 1, provide address inftwrnatirm to the cache 
20 memocy 140. nd also to the oomnand memoey 120. 

Tbe actual rrwnpnnmt need for the integer proceesing unit 240, in the present^ 
p re fe rred embodfanent, has dgdficant arithmetic capabiB^^ 

multipiica in hardware. Thus, unite 230 and 240 together provide a large unount of 
arithmetic luvdwara amdfadde fir the popoae or addreaa generatiaL In additloo, of 
26 course^ the seqiaenosr 210 kkdodea aomo logle which ate performa the ftmetkm of 




Note tlwt the address generator 280 has an outp«it 231, wfaieh la bu£fered and 
c onto the CD bos 112. The sequencer 210 can read the outputs < 
; 240 (throi^ Uk register 21^, but the IFU 240 can ate be < 
30 to drte the CD bus 112. The cache memory 140» theFP module IBOr or tte i 
190 can ate aooeaa these results^ once they are put out on thia bua 
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Va^fw^ ^mtiriilnn rfThiffnnt Pfarf t\mMfmm Ltd- EWLfl 

Begiotcr 203 (shorn at the top left of Rgura 2A) atom 
aigneto. These include sigDala for dfagnoettm ^ LSD control s^gnela, etc 

5 TIm IPU 240 containa a 4 port reglfltter file 1110, aa ALU 1 120» a oierge unit 

1130, and a laultip^/dMde unit 114a A sfan^lfied diagram ahowiag theae I'^nrnmwrnf 
it shown in F^eure IL The two esternal data patha are dmm in this Qgan aa the D 
and AD buaea 1101 and llOfi. In the control proceaecr module 110, the AD bm 1102 ia 
w m nected through register 241 to aenre as the address bus to the various me m orica, and 

10 the D bus 1101 eoimects cBreet^ to the CD bia 112. 

The IPU 240'a four port register file lUO aSowa, m a ahigle ^dk^ such 
oper ati ona as rl » r2 + r3» in addBtko to a write into the reglstor file vte the fourth 
port. The ALU 1120 provides all the usual aritfametle and kgiod *y*>f*f'nwi aa wefl as 
prioritgr encMfing and bit or byte reversal i natr ta- tlu na. The fidd merge unit U30 

15 provftdea muhl-fait afaifia and rotates, varmble bit field cxtrMst. deposit and merge 
fimctkoa. The mult^^^dbride unit 1140 nma — y^ rf ^t^ frtn the rest at the IPU 240: 
enee it baa started doing a mnltiii^ or dMde operatkm* anor other boq. mutt^p^irMvide 
l aat T tt ct iona can be eomcutad by the ALU 1120 or field merge unit 1130. The mtdtip^ 
opcratkm la 38 fay 32 signed <8 «»i the Arida opetatloa ia 64 over 32 bite 

20 itnalgfted C20 qpdeaX 

Register 241, external to the IPU 240, bused at the interi«» to the GA bus Ul. 
TUs intr o du ce s a pipefine dslsy when ac j f j essing memoty. CXUs registo* is neeessaiy 
becauae. with the specific part uaed here, the AD boa ia not vafid until 76-90 na after the 
start of a cycled 

26 The mkroeode instruction hiput to the IPU 240 (on a "C bus 1103) is registered 

interaa^f On a register 1161)» ao the microcoda inatructioa ia taken direct^ from the 
wiitidile control store (WC8). 

30 The coo q g uratl on of the artrfr PBS generator 280 used in the p refaCTedembodim^ 

is genaraqf diown fai Figure 12. Kqr elementa induda a 16 bit wide ALU 121% 30 , 
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kit«nud registers (ftmrtimyiHy groined as 16 address rei^sters 1222, 4 ofibet regtsters 
1224, 4 compare regfatera 1226, and 4 hritteWwtinn registers 1228. Abo ixMhided are an 
address cf rn ipa ratar 1230 and bit l e Meis er 1240. Ao Internal bus 1200 provides data 
nniting; and a T bus 1270 provides address outputs 231 which are fed bock onto CD 
6 bus 112 (when output bufifer 232 is enabled)* the D" bus 1200 Is connected to provide 
inputs oar outpuU to the sequencer data bus 210, which is separated from the CD bus 112 
by link regiatei/tra ns ceiver 214. The actual device also lacfaidea an Instructian decoder 
and miscdlaneous timing and ghae logk, not shown. 

These tetures aSk»w the address generator 230^ hi a sin^ Uk 
10 output a 16 Ui address^ 

modi^ this memovy address by adifing (or subtracting an oAet to it, 
detect when the addr ees vahie has moved to or beyond a pr e s e t 
boundaiy. and oooditiQoal^ re-initiaBse the addreas vahasw 

Ihis latter step is partlculartr useAd Ibr implementing dreular bufltea or module 
xo aoareasm^ 

The address generator 230 ai^nents the address genetath^ capnfaifities of the 
IPU 240. However, the particular eUp used Ibr the address generator 280 can on^ 
gsnemto 16 bit addresses, if operatfaigareettK. (Double predion addresses woidd take 
two flyrle % or two diipa can be ra s paded ) the presently prelbtred em b iM lli i irni , the 10- 

20 bit address outputs of the address generator 230 are passed throi^ the IPU 240. where 
they can be added to a base addreoB and extended up to 82 bits. 

The addrees generator's regtrters are aecessed^ its 16 bit wide D port» which 
is conne ct ed to the ssms local portka 216 and link register 214 as the sequencer. 

The srtrlrfSBBS oome out oT ths T port 1270 (shown as fins 231 m Figure 2A). 

25 The addreesre are passed through a three stafe buffer 282 beltare connecting to the CD 
bus 112. Whan either the address gsnerator's D or T port is read (Lfi, is called on to 
drive the CD bus) the 16 bit vahaea can be aero eitmtded or sign attended to the bus 
width (82 btts). The logie whidi pertoms thb is located hi sipi/taro extoid PAL 216^ 
wbkh is dIsBiisserl in peater detail below. Zero ntftMiinn or sign ertenejon is oontroOed 

30 directly from the CP microcode. (This feature is avaflable when any of the 16 bit wide 
ports are sdeeted to drive the CD bus.) 
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Tbo infltnictim aet oT the aOdrtm fenertttor 230 Is divided into the foOowb^ 
T«onping> 

Register transfers, 

liogbcai and ahlffc opmtlona. 

Control operaUona, and 



The nUcroooda Ittstruetkm inpui to the address seneralor to regist^ 
so the mk mi ' o rte inatruetta is takoi direct^ from the WGS 220. 

10 ^'^TrTi'Tff imriiitiTi sudpok i^mIb 

In the pgeaent^y pffefarred mnhodhnmt > sequencer 210 enploya an ADSP 1401. 
Ksy elem e nts of thto partiaAg jmpl wnenta tio n are ahesm in Figure la 'Hkeaeindnde 
a 10 tit adder 1310^ a 64 x 16 fatt RAM im hitemtpi logle 1390^ intemipi vector 
storage 1840^ and fbur lo^ coontcra. 
15 TlieiDtenialRAM 1320 can be used m three 

Am a register stack: Thk aOaws up to (bur widresseis to be saved on the 
a Mfanmhie. Tlieae can then be accessed a ^ bit 6sld h& the 



As a snbroiithke stsclB lUs provldea the 1 
20 te subtenthie Hnkags and iptarruptSL It can also be used to save other ] 
aa the atatns r egis te r or eountera> 

For faidBreet addr es s storage; This alkwa an area to be set aside to hoM 
aed a d d r ees es . Theee are sffrw es erl neing the least significant 6 tits of the D port. 
S^mck Omit r s i^ cra 1321 protect agsinst oie stack area eomqitfaxg nother, or 
under fl ow ei ttwrt totts. Ifooeof theee occurs then an intemid internet 
i error co o tt kn can be flagged or the stadc estended offcfaip (stack 



Tte prksritlsed iitfernvts are catered for • tvpo internal to the device* fo^ 
errors sod oounker underflow^ and ei^bt extemaL All the intemqyt detectioiv registefing 
80 and saasfcing to handled OMdiik» by kgb 133% and the eorre^^ 
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firctt tto intefiupt vector file 1340. 

Tha instmetiaii set is ooin|ireli«»v« wHh a wide variety oC juinps» 
subroutine caltai and returns. Meet of these Inrtruetkas can use absohite addressesp 
relative addresses^ or indirect addresses to sped^ the target address, Thsy can also be 
5 quattfled by one or the selected conditions: 

UaoondiiionaL Esecute the instruction ahvsys. 

Not dag. ir the oonatlflQ code ii^ut (caOedFIJtfS) ia fidse then esecute 
the instruction, otherwise continue (the usual ftfl faistniction). 

Flag, If the conditioa code input is true then execute the instruction, 
10 otherwise continue (the usimI fi^l instruction). 

SIga. SaecutioQ of the InstruotioQ depends on the wipn bit in the i 



llkere are also instructions to do stack m anag ement, status register «r*rB^WMi 
utiiMwi and Interrupt controL 
16 The mi er o w ) d e i nst r u ct ioo input Is registered intema^y, so the mirrocnde 

it taken directly from the WC8 (unrc^tatered). 
The seqMenoer support logiB into four categories: iotefTi9ta, eon^tioi^ 



20 Tnl Ilium 

" The drip ussd for s eg f uencer 210, in the pre e finiV p r e fe wedemb odfment, on^hss 
Input pine. Thsreferean catemal muUip i eie r 213 Is used to extend the 
of available intem^te to ei^it. Hie interrupta are nudh^ used fiar 
and to siyporl ^^lgil^'*g tools. 
26 Tlie intermpt soureea are On order oClriglMst privity first): 



Within the debug environment there are^ nominaQy, two taahs running the 
Mkandtheuaer tadL The dscw kgle allows the user tnk to be aii^ stuped 
80 without stagle stepping the monitor task aa weO. The daw logte 'daws bade controT to 
the mnnltnr task alter one luatructioa In the user task haa been run. The InstruetiDn 
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thrtwtum«ooiitwibrttoUiouwtMkr«iuM^ Since thto to del^ 

bgp QM «jcl^ tbm liitvRiipi ooem oo tiie OrH InitnieUaa ocacuM to tha titer's taek. 
Tluie cQaM to retanad baek to the monitor t«ik beto« the next (Lfi, the aeeoad) 
instruction in the user task to esaeuted. 



fait to 



TOs lnte»rupt towd to connected direettf to a mlTOoda hit, so tha^ 

•et an interrupt wiU occur. Tbk$ provlilea a convo 

breakpoints. The inatructko with the breakpoint htt eat wiB be 
and then control passed to the breakpoint handter. A19 mnnber of 



ihto 



The VlIB bui intemqit to 
ibould not be used duriiv 



uaedotttf fbr 



the 




When the floatiny^poiBt 
ita WCa the TP clocks 



TUi Urteiiupi to used to ftcee the 
Boeeeea to the data eaefae menotr. Thto 
nodule 120 to gsto aooaaa to esKbe 14a 



OMduIe 110 to temporari^ 
the data 



Thto totemqit to the norsMl aetfaod for the <kta tran^ 

nodule 110 that there to a connMnd to the <9 < 

190).»» 



to inform the 
FIFO (to 
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EmUl 



TUs iBtampt ii UMd Um ^ 
eoQtrol prncwim module 110 that m dite 



nodule 120 to bOoem the 



FIffttinff Pflint ] 

IWi teterrupt ceo be e. a feeull of tto 

of ea emr (wtwoe type ew be deOaed la lelfcwai^, « a 
■eeor module laOi Tbe active events om ealeeted li^ 
TUeintflfniplifraoluaodatp«Mttt,aBdtoi Pweeaeor module no < 

FP aodniM m the four FPi 1 

will thert^ Dead to idnti^ wbkh caueed the kitmpt» in 



bffealqMSiBt ia the 

r.oiitheFP 



ftrftteauee. la 
tfali toAemipt. Thelatem^t 



Note The iatempta 
eode logics «o that if k to 



For the intempta 



with 



the 



TUi OBB be potted if 
Urn an of andtipleaar 214 to 



>beteitadtqrthe 
to ba polled 



bthe 



F1FO< 



To 



tfaeaaateori 

the four low 

For the fbw hiilheit prtority iatiirupta to be leoepte^ 
»**f of the ndflffnc o d t dock. For the 
the tine Hmlt ie 15 im bofbra the flOfav «^ 

taterrupt niput iB hM hVi Or one 
Q«lo to nwpff M Mj, ao the iatemipt 



No 



Inteiiupt 



The eeq^encer has a sagle coocfittoa oode iaput caBed FLAa and eO of the 
tiiet a l ila ^alue %nato are muit^a o Bd into thto pin. Thto to rr^r nwl iiHfii imHy, imi }mm 
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a nannd .et thM rf 10 M when no to niMked (count* ua^^ 
iia»h««nablad. Ttw poirtty of theFLAQ tout am h«rh«.fU i,^ thCDoqtjmu. 

aaen in Figure iOA. addWoort togic h p«fc»rt^r t-ad t« 

5 »«««|u«w-aiOteb«fti|^Mterad«fUr«iintanapt. 

A PAL 4081 to to wadrte tho tetmal fflp^ 40aa tadd. tiM, «« 
210(0,310). 11^PALtoth„.op«t«.-u^«.^;X-/llJl^ 
can arte under cnfwtttiooi am ihomn I& Figure 40tBL 

WI« „ tot«upt oon«. th. «,p«e» iHB to an iate^ 

10 U*««.in.tta„ri„rio«.|«^ 

ll-gcoodaknwWeh«d»*edbaft»tb«ln«otwpt At tho ond th» iBt«iu|* 1 



2™^^*** """""i*"- aia to rnmmaiiili il to pratido tbo output oTtho PAL 4081 tli 
FLAG tout to -iu .Dtc « 210. Thtor— tf-i-^ ^. _ 



itlMiatemd itate oT fl^>>aop 4000. Thto 

16 Tl*iip«ticMte^topoftM*tftlioio*«^ 

to . eoodiltatf teneb. C««<» «-t««ttoa tlio iDt«i th^ 
> wfll beeomctif ( 



OTooin^ tlito logto would not bo 

to od» 

20 



. . . ■ . ^ ^ ' awiBMiBot^ Hcmerer. it 

> ad«ntam w«h ttao oorttoutor — »~ uood ta tho 



Tho twtabia atatw I 

IPU 240 eoodtttaa coda output: tbto rai^ tho statu* cT the 
WW* conditfan to Indtoatad by tha IPU output on thto pin to defined ly tha 



a» '"■PfTWi to* statue hit in the VMBinterftcaoontioI I 

and to ueeAd ftr (BaffMiatto aoftwaro. 

WHta flai^ 0 and L- HieBe two aluato aDow batter aoeeaa to tha hiten»l 

» th. data cache aaaoi, writa togte and a« «««r uead h7 tba *at. e«o and 
leeda to tha defai« mritor. 

» 3*"'«-»««'n*tote.»adwhen«tunringlh«aninte,n.pt.eoth.t^ 
juBift ««o>t«l«wact^awnifltwaadhphcodlyajuinp totha 
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FP «ati» CVWArn This k dMnd wfan U» FP hM llokiMd to 
lad ia waUam tat \ 



FP itatua aipMd FFWATT: "nta to c1hb«4 wbaa tba 
S Moduto UP h— gnkbed iu caleuktfcw >nd U .mitfa, fa, ^- 

FP itataa riyiai baak_Miaet: TUi iBdkatM wlilob hair or the FP I 
ate to riloo-ud to th. cootrol p«««or awM. UO, wim tht «^ 
th* dauUa biArad doglnl) aoda. «<■ 

FPiMttuBilpHi: SMtalloep. TUiii«itraetadft«atlMM«flrthtMrW 
10 lo«ptl»tw»>tl»«. ^ tl.^flo^ prf,t,t^.^a^„ -naiat^to-totMiMril 

lo^ •■•» tha cBotwl pcooeMor n»*ito 110 to ertMct (Md towt) th» totMrt itrtw fl* 



'^«M*^(ifn«akiB(pefnita) «lMiM««rtlMfr ^ 

W to tear. Tl»il»i*FPWiaTaii4CPWiOT «*ict^ taBta4*»'tIiJ^ 

— — ih<|h aprin beftra tha CP em tha Inliaiinn tiwrcaL 

FP Matua aigttal: »Jhraat p oi ut . Tfeia ia oalr wad Or " " mhrn Md to 
tfaoFPUta 



FP Hatea »_amr. IMa to aat whanaiii aa atwracaw In tha 
** *''**W|^^P»M«» noduto laa ttbaabaMiindudadteftrtwaiM. 

tobataatadtoadaetadtTthapottooorCPi "** 





Tlw eoodittoa outpot Aan tba IPU MO to ftfd too tola to oMat tha I 

(aapadal^ aa it wB ba dd^ « aHtttptonr), *!» cjdtag in 100 oa. 
86 W»««-«tattlito«odtttoBth.ctockw«aaadtoboatwteliadtol26i» 

! tiv bepar to ba« doaa bf uriDff ooa of tba eouotan iBtetul to tba 
tbaZPU SMferaddiaaacalnhttooai Voa4aofa caiM ha , 
tb* IPVm but ttato iRNdd i^Miaa aitm omtaeMl dua to • J 



Iba eoBanoa eodaa aaa m»H l n i...i. i an 8 to I miittliiiimii au toto tba 



80 -FIMF tapiA iD tba aaquaoear. Tba Mqua^a tetanaQ r ragtot... «id 
pokttly «r tba aatoetad eoodttka ooda 
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Th. iBkRMddraa bua ill aWA « be drtm Aoa t«o wmattK from Uu 
•«pia«» 210 during iwnutf peogpm «aait^ 
* VMS hu. «n d«. ««1 eoo^ 

wbrt addriMi th* MMiuMiMr to at. TUm im daoa m^oiiaoaautfy to ibm mmmaem, 
opentteo, aad to main^ uMd ft* ^ 

•n» Biera^ddre. bui 211 to aho routed onto the FP aoduto Mmm 

each FP oMduto. "iaw «Mre« bua ertaoiioQ ailA om aho be »^ to dffc. »k- 
PPWCa TOarapaMMycaabeuwMftrtwoteaaMa; -»«~«» 
1) Aa a aeaas Ar the taoik to iMrida aD wkbea wfan the 9P I 

to deim loaded. 



Z5 ef n ch r omaMte with the oootfol 



a> Aa a aeeiiaiifam «4ifch oouU be need te nuii« tbe FP 

Deanr moAiie 110^ ee that the aiMif of the 

aiO in the isoatK* ptoeoieor module 110 to »^ Mthar than the totewa^r 
me. cnria capahffity to not Breeani in th» f^x^-tpg i—^-tM nmli n lib lim 

to noted ae a raedQr avaitoble altaraatiM.) 



20 



i^tatM. -.^^ rrmrtim fUM nTlln mliiiiliMliiii ihai to nrtitf wadtopravide 



to the •equeoeer. but eea atoo hold eoutanta ftr the ( 
The Mv>eoo.r 210 ha. a Wdtoeetfanel coonectton to a pri«f 'tocai bu. (the 
databua2U). 1W» P«Bto jua** ott, to be done in pawBal wtth acttooa 



» uiiog the CD bu. 113. l*ea««««rd.*.bo.toBakedtotheCDb«am^a 



Itok na^tr/tnaaetinr 814. The tiaHtg at the etoeka and the fted 
tt«y ««tt«iltotfa.aakretfitarai4a,ei,,ri^ 

dh*iU^ beeauae the wwiee and deMfnatkaa all h»e diflbrait foqufaementi, Note 

that tha addMOB BBnefator dtta iniNit 1380 to ocnnoetod to the aequaoo» Mde oT thto 

» beeauae th.«Jd«.agBn«rtflr 280 ha. the «n»th»tagwviin^^ 

MKpwmnr tee tnnafen oo thto bua. 
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CooaUsft Oeftd •> Sequeooer (Jumps) 
Cootant field •> CD bui CBceiater 
Sequencer •> CD bus (DtegDoatki) 

CD bus .> s«<|tteoov (Comimted Jun^*) 

Cawum field •> 
AddMs QeQer«ter-> CD bus 
CD bus •> 



0 



10 



SSL 



Mftmd onbodBoit. Um WC8 
pnvMs 8K bsr 83 fete or 

228 for lottdhv iBicraodt nd Ibr 
wia be iiiriMiuj in 



15 



280^ and 



tte 



'2101aMUiflk 
WCS29a 

^ ttmM bm noted UmH tfas cootni 
&i addttloa to Um priamj WCS 
20 loeated on tha bM bowO, 

i496raoei««tba 



25 



with 

CIba< 

) vntaaoTtfaa 
Tba IPU 240t M 




WCS 220 oottlaiM 90 bita or 
400 each qm tha 1^ raaca or 
or tha #CS eatanteia Ob each or tha 
32 btta ortetniecion aft oadi 



Tbaoparatloocrcha 
100) otti read from 
At thta pcrfnt^ 



loop faPf wfafch tha 
^vttta to aa or tha 



thraqgh the VICB 
wffl ba rtiicumii J bl 
wflt bai 



J304509 



A« noM. NtfMw 222 pmidM « regtetored mfcwhwtnietioa output, to tl» 
1» «rf t« ai«v other logic md ««oiy eoiBpooaBt* An unitgtat««| 
output 221 1. ako pnvided. to oonpoaoit* wUcfa iatetnd 
te«»«»ctioii MgiMoriiw (For eampte. tte IPU 240 ha* lBt«iMl inMiuetkp plptfa. 

It atao hM aophMcMd intanuri <lM»da logle. Note tiat tte IPU MO abe 
»• ro*rt««l ooatw4 WU ftom tl» d8«hr aeo, ftft output OBrtlt ^Mh,) 

•a.«*t.raaato«tu-(y.«Hrt*irfow.,,iit«. u «* oo«r p««fch. • 

> Uttoughput» bill alM hM A MrW aeoM Bpd*. TlMMriali 
i ftr iniefftet to the aerial loop tecriM babir* 

pgQrida aaarial oa^ (wfaan dcMndaiD whlA .^..^ ^..p^J !^^ 
221 (or, coovme^g to write the ftil width oT aa iaalRietioa 221 back iato tha < 
•ten 220). aO tail flelda Buafc be aoooMd 



ia ahowB aa a biiffrnrflraial connacti oo 211& Tt^ to tha CP 



boa, whfchia coooactad to the microcode 
ThiB aane bqa paovMea the mfarnadit aw fataAi^ to ^ rf ""utmi etcraa 
20 iatha,,.l«tt,«e|*forthaIOTeoaliol^ Tba mfcroaAfraa. Hna in the 

pair in the VMB : 



25 laaot 



As oBponaaa netiMartfa*«MMtoiift»^ 



but dtoeetbr contNii aa dau tranitea to and 



the nwnsMna ofocaaa^ ....^ 130. BIdal cT thia loglo to plvrical^ oo the PP 



but to eontroOed by the aieroeoda of thecoatnii pfffwieaia muilulu XlO, 
totheCDbuB. HitokgtotoaKwaadinaMchfranterdetaai 

oTtheFPi 



90 
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Tlw outd* ragiiten 203 hoid tte or Sttto uMd 
JuitiQr dedkated mkroeiMte btta. Hm Qwds bits m 

Ftow through or regiitcr eonftroi oC tho rmd holdta^ ragtatm Ml. 

Flow ttarouglk or regliter eontrd of tte write boldli« raglstert 561 (Um 
regtotm afaown m 661 la Figagm 5 are wtuaQr doufaM» aad on 



PP 



mode for tfa» holcBi^ 
Two LBD OQotrol ligni^ 
Module idect (3 btttit thto 
130 aad/Sor algocttlim eeeelmun 130^, 



themoit^ 



One «r the Amtei of deeodor 900 to to deeode the mienioode CD 
to oooM the output eneUee of devleeetlatcn drive the CD lM» It^ 



^ iffrUm ffnM tn mnw loitj i luie tu mm l-U ij„ -L■_^ ^, 



fay witte Site 4iM from the do^gnmtor). Moot oT the porta oo the CD bw < 
raedaDdwrittea^aomiaiferesiaterelatheiro OfotetlMi 
flftheCDaoureeaaad dft a tinatinna wfflUecBtf^ 
4101 retbM* than fagr that in the daeoder 960J 
PoaiMeCaD bua aoureoo and daatfaMtkni iocfcidoc CPU 340; i 
Date cache nanary hoUtaf regMara SOOA; BIbde raglatar (8 bita); 
reglatar***; FP eoBtrol reglatar * **; Stvt eddwa reglitar * **; : 
(8 faita) Statue regiater (aoQiee 014^ ^ Adihe« gsneftttor eddrl» port 

didn port aeq u anear date port *n Contat / oast eddreaa field (e 
ooW ff iigiitBii narked •are part of the CP Eaten^o Logfe 410t whfch ial 
oQthefP flBodnle. Theee reghtere ve aalacted fagr OeUi in the eitendad CP i 
wtaloh la yered in thaWGSetteneioa 400. On^ the ealaeted moduM) leopond to the 

fatau ^Ste m ci thern la aalBcte^ the 



J304S09 




Uiiltlit<ktete«iUMr 
Note ttei oQlr 

240's iDstruetiQa field, to thst it am 



or smeitanMup (o tlwbia width of 

into th« IPU240 it under 
ita from the CD bus at the 



«od thet they 
ofthelPU 
» time it Is 



Ths dock generstor 250 produces the 
modulo 110 (iM tho dots 
cyd»-duntiaa inptzta from both the cootrol 
lao. nsdumtta of the omnt dock cycle k 



used throughout the 
module 120). Ui 
UO«iidthe<ktetim^ 

OQ the ia innn iImuh with 



^ durettoo of the two rocej^edfrttn thmmtm^ XYTP 

ms gneretor ie iireteefa^ i^lemented tM^ s 
This PAL femmtae one or four pred 

b«e dWW periods i»me<r 4 fi, * .Ml 7 thoes the loput d^ 
to 150 and 175 mhea a 40 MBs oiebtor to i»e(i «b 

Pour dock outputs ere produced. AD 

CO the 

cktt^apipdhiedoek,o 

The oDicrooode dock Is slw^Fs hWite a 

Is low for % 3. 4 or 5 <9do% w selocted the i9de 
the 




The 

» dock osn be Ashley 
ThewritMsfali 
MA but fotuns h^ 1 i9de before the 



providee the 



for write 



loTthe 



•e the fflkroeode do^ bitt the 
doeknmning; formkroeodekmdii^ 

rooecyde after the ak roco d e dock 
ckxkdoea 

for all *f thn numwrins and 



the timee-two dock 
•dfB occurs at the Si 



at twice the frequent the mkrocods do^ doe% 
osathemkrooodedock«^ Tlds is a 



1.304503 # 




wfafcfa to mud ontr by th» Tnt t f i r Prnnn i ui Unjf 240 md $40, Tte IPU« 
uMtUidoektodoekUMlr OntariMl) onittip^AfivUt logk;. in oftler to radon the tins 
k fbr ihemnultiHTeift AmetkiM. 

Tbs cycle lengUi or tht ctocto .ra 
m thotB ilwMon, The <vtto duratko ftr «^ iMtr^^ 

Mtooeode ■M wmlilm. and to tnehidad — ^ tK> i^ ^i ^iti^^ Tlito tn ioewMt 
hk pwfimuM over the OM wbm « flnd i^cto tonglh to u»d, fa wUA cm «D 
wouid h«re to tidM M long the atowwt iiMtruettaL la the meent^ 
fiw C9cto toogkhe ere Mpported, of 100. 126^ IfiO end 175 im. 
wfll oee the 1 

ne dock qFde <M etoo be eiAeoded Iqr e Ni^ iipHd.* 11^ 

th^enl 

» tb^r eve off-boerd end theto 4 
For emdeb the VMB 1 

port to 

I (ee dMwn In FlKure 48) with e 4 
e oMmaqr boerd which «Mee the i 
flo tte nwmnf y ^ iMod> i nrnm i to the< 
to the prim ei j oecfae 140i la 

the<9detoi«th wSbei 




4814 with oo^ aodeet hi 
The hoet CM eottferoi the dock 
eetoct whether the dock 
the hoet cen 

110 



^ the VICE bue hiteftee. Tbe hoet 
200 free rune or to stoftped. In the 
^ the dodca Note thet both the 
nodide 120 wfll be dcul 



The flnel control iixto the dock 
the p^pdfae dock (which bdm 
WCa ee it eBowe these aetione to 



to one tiMt infaifaite eQ the< 
r>. Thto to need when kwBi^ (o 

the toterud aketo oTthe 
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in vvkui ptaoM above. K^f 



The hM cn tehe control of ih» 

fbr write. 

W Thehoeto«nlaedaodr«db«*tli.WC8M>. 

T1>e eopttei nrnr wia maOuim iin P .iy>#yu^ ^ vm« 

Oi^ CD intemqit <hn the hoM M a 

•■ppoft ii fMvlde^ with no 
thet«reMt«t«7< 
U <3>wl08lelipro«UMiweii 
the oMoifeor 

AM regteten are raed/Writfl^ 

for 

A 

ao Aflintemvtocnbe 

llie Jaternal atateer the critieal lotfe po(9a CM be 

^ — ' "intrTrl |ffniiiMia muJuli 110 



86 



110 hM 

htfeftr 



bua 311, both for Nad Md 



OQ the 



to be 



PlyaeaBihowe ac?hamatirrf^ theorgttiatfan of the] 
«.WCSMDandtatheWC8e«tanrfo««k«^ ^ 

nwfced with a • oo«e direct^ the WCai ml a» pipeined 
thtgrateeantrolBQ^ -me ItaoM are r^glatwed at the output of the WCS 220. 

Bit fldda marlnd •» are pbyaicnltr atored In the WCB mMtm^aa oo the FP 
80 nMU^batarepartortheGPinkroeodeword. Moat of thaee sdtroeode 
the module aelectioo lo0e^ and wffl hate no eftct if the FP 1 
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or Ufa 



Tbm tout mimber oC oderoeode bilt raflnUe m 96 pha «d eim ptr 7? 

k>8k»i operttoa or the IPU240 faHegBr prnr eM Oi . Th» sOoeMtkn orbito wWiiD UmAM 
iaeneoded AD In^ruclkiM an <moM la th* botten 84 bte Tlw lop 8 bite m oo^ 
uMldurtagthfttiuiteordstelatoUMlPUSiO'arog^ ^p^^^ ^ 
Add wt^ be Ibitod In the Miii;AKfeuMr't dite aheete fbr the IPU.) 

bf the ADSP 140L (Father dsiA oT tl^ fldd ^lIT^Tl^^ the 
dele eheeto Itar thai pert.) 

f n^t .^rfcMi firti ntn • nik field li nak^r iMd ftr 
to the i fwyienceff, but oeo elw be uead to (in e 16 bit 
nwHt e nl ndae onto the deta bee. Thto m tbm be loaded into «v of the ] 
ti^bm 
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ao 



or tte oootrol pf oc e wor ooduki 110 holdhtf reglitan to b« the source 
I for m data eaeha tntiuAr. 

Pnm nrtWt immWIT IfflWttLtai These bite ^ed^ the oumhcr of ^ 
final the FPhokfii^ rasbter to write into the deUcttchen^ .^le Bret wor^ bi the 
hohflng regirter Is spedfled by the leert 



Condition roda ariaet f}}) aaleete one of the foSoiriiig 

oodee to be tested by the sequencer durn^ e ^'^^^Hai inetniBtka: IPU 240 i»i>.w4^rt np 
code output; f^^Metue (ectusl stMae selecM ty « sepe^ 
10 ]oop( write Ilo0s C2 entriee); hold staiui 

CD hum mirt^ (3) Thi> ^^^^ ^ ^^^^^ folknriiy i 
or diwieee todriiw the CD buK IPUaiO; Ccmnttid 
BMBMy holdinff registers Ml*; Mode regtoten FP modide*; Addre 
pioft; ^drtf IMS geoertor dete port: aaqnang^ Amtm n^^^ y 

16 Note thet the p«tk!ulftr register or buflto to use ae the aonmie I 

fMd^ 

~ tUs fiMsslsctsoQeorthej 

thedstftootheCB bus 1 
hobSng register; Mode register FP module (tl 
20 OS the souroe Is specified 10 another field); 

port; Sequencer data port. 

Tlie iro 240 is not inchMled becMse It can the dWn on the CD bus at ^ 
CIUs ftnction is oontroOsd by thelFUtBstruetten fisid.) 

AddrMejeflirter eontwd fK** ^♦h^ ......^ TmsMns rhn tueiliiitt 

2S or the nrtitrs ss register and the othsr bit eosblssrendhMsh of the register for use by the 




or devioeees the 



port; 



the 

CZQ) 



IS The aost sipiifieeitt bit (broadcast select) cotttroto how 
two bits (module ID) are interpreted Whsn broadcast select is 0 the module 
the ringto module which is to respond to e <^ trmute, either with the <kte 
ortbeCDbua When broadcast sdeet is 1 the module m selsde which 
of FFs (or algoritfara aooelefators) responds to a dsta transfer. lUs allows ths 



10 



mam ^ to be tt i nrfwr t d to multiplt iwHiM i Ho oa t khm ^mt htm ti i 
thnlndMduit write NeU thil tMt to ootr vittd Ibr tfuteo to tho oodite^ 

Itfmftlln ifflm iiiiiilii fll Spodltoi wtiether thp laoduto to setet l» dtflned 
iodme Mieet add orl^ tho mode rogistor. Thit alkmo tht aoditit 
to be a^Mtod on o eydo fay <9cW boiii «r hmm glofaoQf . Ibt tfobol method la 
when tbe work cMi bo done « eiv of the PP oMidiita pweeat Mid 
inodule 110 picks the to uee bofbro It eterte the I 
nt amOoble then there woidd be e 
to eotfOipoQd to evety FP 1 

aaiteiBLlll (**iMi««oontr*^)aeii 

to pMoad to the 



irtheiloM 

UO 



15 



flBLlll Debiig oee on^ Set to irtno tha logle wh«i ^ 
oueartedu TW. cmee en interrypt to oeeur dirt- the n«t 



tothedebiv 
TUe elows « ueei 

thedoefca onend oO: 

120 tofltteite 

m to 



to 
to bo 



thei 

^^9^ wtthouft Vhpktify 



the 



S6 



or devieok 



in the 
Mult oTtUa ig to 
qoeue to find ite 



TUe onlf bee enor eflbet when a 16 bit 

lids d9»l nieete whether the deta ia sero eitefided (btte 
»9l aet to sero) or dpi eitended (bto 10- 01 aet to the mam m bit lOX. 

BriUBM&iU Tlite bit iMvente the up<kting oT the 
rngirtai that nonnllf foOom the ateteoT the FLAG raglMer buMa the i 
Komel^ tfaia ndmkr bit fbBowe the 
30 MTvlee the aaiBde bit ia prevented IhH 

to be oorreetbr restored when the Imem^t I 




aio. 



a atateb bat diirii« en 
nia peinka the FLAG 
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110 



ripMti <Wv« Um eoounoQ eooditta ODda liM 

Tbe fbOofiHag an l eiarte d: CPWATT; PPWATT; b«iik> 
loops cp.wBit.iBtcmipt; Q^.wait.l&temv^ fpJbnU^obH; ud f^.amr. 

BsrittSJSteliSL:: TbeM biu adeet whtt 
444 on tha FP module « to b# wtti oc wfitttti vte th» C© bw 
IIm ngtem and buflta m ttaurflw control 

; iMtnicttoD wgfcrtor (8 bto); StotoM rtuNar (aouw op<y). 





tho boltfng roflMcn 480 (oo tho FP nodule) «kd tfao fP aodula's flHt 
Oni^ a britf doMviptloQ of 6M!h subM Indo^ 
beooao appmngt ooeo the ovm 

the holdtat Ngirtero 420 oDd the nsiitor fifo 48a Tlie directkii io 
n^tUBt to reglte file or r^gliter Oe to hokft^ icgtoter. 

IkBOifiK.niihiftIU Tbia btt slarta a 1 
the otlMT flriMoode btta and the ragiatared eootrol bita On the 1 

of the fiwt ^ In the ragiatar fite thai data ia wed from 0^ 

fla eddraaa la to be modified to ImplanMnt r^y^f^ iogloBi or preview i 
wUeh ar« afi oonmiied with how the regiatflr £Da k te^ 



q^aa 
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wnMh.> »««if^flf rtirt niMrw iimlTr m Thn hniiiim i imiimi ,t^t 



addTM ena he ^MdflMl to ooM fraa OM or Ubw Muieec 

1. Ptwa th» CP mk^«^ «UM „ .fff prrrimw iwi^anli. 

2. Ftan « Add hirid la tte FP Mdula'a tn^te NcMv, or 
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tfouad a aa bUlDtoger Prooe«or Uail OPU) aiO oooirollad • 16 bit ■aqueneer 3ia 
Th» aiin data path within tha 4aU trmfbr proooaor modula m Ii tha Tranafar Date 
hua (TD biM> 122. 

MiDj of tha portiooa of DOT modula 120, in thia ambodimeDt, ara r\m\Aj 

anafegoui to portlona of tha oootroj xi<wiiMia modula 1X0. In geomi, "^ f iiimrttnj 

reteaoea oumarals havo been uaad to indkata suefa ^ ^ v um * tj Thua» a Mqumaoar 310 
provkleaasaquaDoaof mkaohntr^^ nrtrti ii njt 311 to a writabla oootroi at<aa 320. 
Tha aaqiwjcar aiO not ontr mtaAcaa wtth tha TD b« m throi^ iaglrt« 
racahaa cooditkn codea through a omttiptaar dU and raeaiTaa intamipta thrai^ 



«ut|aitia21,aodaragirta»ado«tputtoatopfortd^ I*Ma 311B 

and 225 provida addrai and dau intcr&co from tha hoat to thk wriubia ccoCrol 1 
a20»aawfflbadeacribadbalow. (Una 324 li a writa anahia ftia^ uied in 1 
Sarial/lparaM shift ragistar 328 rimdowa tha intanml stata of 

iputa. A 16.bit aaqnancar bu. 315 •!» prand- ld*it h^ 10 tha 
3ia TWainpiAiaabuflfaradiniwt. whfchcanba»ad,a.^to'fa^ 




Tte micromatruetfcma 821 ara ptowdad m wgiatarad tojwt to dacoda logte 380 
(viaahado«rragiatar322>. T1» outputa 381 of tl^ dacoda kgic ara prof^ 
hip«tfc^tha fatager procaaring ua^ 

tBitr^em 180^ and 170. In partfteatar» tha *ntf pirta of deooda logfe 380 aoeaH 
to tha TD bus 122. Nota that tha TD bua 122 providea a (kta mterfiM to tha t 
intarfimaa, and alao to tha eaeha mamoiy 140. Aa with daeodar 280. tha 
daoodtf 380 ara not Mpanitalf ohown. baeaiM thay ara so parvaaiva. 

Tha intagv procaaaing unit 840 ii prateafa^ a Waitak XU137, aa hi tha cof^ 

Mata genarator ia naeded h& tha (fata 
k ia not ao critical hk thii modular intagw 
340 haa a two wa^ intarfim to tha TD bia 122, and can ate provida 
I outpufta, through ragistar 341. onto tha TA bim 121. 
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of IhiftMt Ptml 



to order to •chtow trwrfw eycte bet^ 
port, wvond telora M oiAerod te 

aikl the tfwite oootrol imirt wipcad to ^ 
tUnc tiMM ilffMK to lywnhinltn a with Uw p^iaMafa^ of the condMoo cod» input to 
the «*4iiuiw, wffl aoQMtimflo eoiiao the tmute to ovtmtD fagr ono. For the tmteo 
intooFIPOthtoloiiot«proh*tni,boo«iiMtheh«lflWlft^ UoooTtho l^lfM 

Oit moM» thrt thaw to ptetfy of «p«i^y to 

Whoa roodtof from tbo FtFO^ etbor ftntflgftga mat be i 
FlFOo with -ompiy r Oi^ or dthfftaC the 
I ii> y.i,,^ , ^! ^ rrTTfmrfifl ■m fa wnimiu , duetothriri 
Two OMthodi «o provided to aolvo tfato 
to tned dopoodi on whtthar the -«-*'iMr1iiTn to « 
I to thid a wfito opemttoft en be undooe CO o , 
ii^ ft data tnoite from FIFO to aemoiy to 4 

be ovenvrittan ee soon eo j 

e. WheamAQgenit^iiMraeaii9HMmla0eel^iBodl^ 

The molt or tfato to tfaiitp if w etttn^t to node to leod n onplj 
to ectnellf portemed. Thto eflooo FIFO reote to ovema withoiit 
The FIFOe protecd themeehee from rootto wheo thi, ero ao9^. but 
ie neeeeeeiy beeeuee the other Me of the FIFO mVit bo writteo to 
reo4 ood thto woidd toee det& Umi^ wheo the tnoiAr 

lao to treotetag dete toto o oeaioiy. it wfl stop irtwa the FIFO h^ 
At thto poiot ea ovwTUtt wfll hovo oocurred. The DTP moduto 120 ceo 





^ """^ «» When deto bnomM «»i.m. in yip^ ^ 



teeifootfaiivhad] 

h. VVheo writtog toto a FIFCX the writo operattoa eanaot be 1 
flbreat oiethod to need The miflrocade teieo tha itatim fr^-^n ^^i^ . 

FIFO (eod ohHooi(jr the deettaattoo FIFO) belbro of^ In thto mod^ the 

stomach atower. Hooovor, if the eeuroe FIFO ever getoaMto thaa iMlTflA 



1304309 




to ft flMt truite aodft. Tbt occ ur r en ce d this eonditiao 
I UmI up U> half the m«» FIFO depth cn be rad out wl^^ 
marlL Tteetev, ia tUe imK the im» nioda^ 
•tewiUMwtrtopidDg to check the itetuiL The Mtue ia the reeeftvti^ FIFO wUl ttffl qmI 
rhii hhiii. unteM it is leoi thoD half ftia TOe mdo techakiue oT ewitchtot between dew 
and tet tmn^ nKKke eea obviousif be ueed with aemortae ee weB. 

2, A three wa^ branch tiMtruetioo in the aequeneer 310 (cnOed 
BRANCH) proiride. a convenient ^ of keeping t»ck oT the nu»ber oi word, 
traorfwed and teeting the FIFO etatuiiipMdfc Ma baa the adv«itage oTnet uriag the 
IPtT fl4H la m^m^Au*^^ ^ tiMrobj the ^yOa luigth. 

3, Far opthBum data tnuMftta to or flpom the iktn cache Mttoty, the 
► to the TD bi» m ia bnffbred ki a re^ bank SaOB (aeen hi F%iw «^ 

oVitdabitreclrteraL Thto lAowa Ad^ ] 

t^theDQL Tbia^aaatoralbf«akki«vbngtrHHte(>81 
becauae there ia no double buOWiv kft thto "^mdm 

120 wffl theretoe be forced to Attpead tnntea imtfl the aanocy cycle 1m» oeamd. 
Tlila break wfll happen more frequent^ when cent^uoua trw^ra are not med and 

4, Th^arfaitratieB of the data cadienBaiaiyli decided at the bagtank« 
or the (7 aodule*a i^cla. If the <fata trandbr lannewie uMxhale 120 were runnk« 
a ^ .k m miua to the eontrol prnre e na module 120 (to allow hMtrwikn depenknt 19^ 
^hu.^_^ tran^ prooeaw n»dnle 120 atftfit bavetow^uptolOOnof 

be anicfa kofer, beeaaae the control prnrf aaiM nodule UO haa priorttsf. and the OTP 
module 120 naiift wait for a free maflBory cydej lloreover. the p4>elnk« of the 
oBierooode kMtnietkitta and FLAG taput to the aequenoer couU mirodnce yet another 
dalagr, whie the DTP module waa loopl^ to aee if the traufor bar been done. 

To flitainiae theee dele^ the centrci pr^ipe aeor module 110 and tea tmte 
120 ahare the eamenfarocode dock generator. Both prooeeeora aak for 
cyde tfana, and the ctock genarater phnnaaa the tongeat one> TbiadioukI 
not reaUr decade the avenge need oT 
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10 



10 



20 



tteFXFO 
») will tmd 
id^Mlfl (FIFO fliB, FIFO 

Thk iBMrts tbft 
th«iliJua|Mdte< 



lii>tniette.esmMlath»tbortM(9^ To ay«m« tto dal^ when looptoft U« 
vfattntkn la dons mtog aoo-reglitflnd mkroeode raqoest Uta. 

Oaa Twy iniaor dmwbadt thk app^^ 
iauaeditwffl alfoct both prooeaaon. 

s Thu% wben a FIFO ia either thaaoum or tlM< 

statua dffiala (and, to aoma caaaa> the daU cncba mmm^ ^^-' ^ 
to ba moohorwl during n tnnalbr. So thai theaa four i 
halMbU FIFO en^, cache aoeeaa granted) cm be 1 
nmAr procaaaor module 130 baa miAht^ 
to be teated into the Jionp ad^eaa. ao tlHft the I 
QQ the atatna during that (yde. llie three FIFO atatc 

bit^andtheertteatkmaignalniitoupthe thWWt. nia ptoHdaa aa 8 1 

When oQ^y the FIFO gtatua la ot intereat. the aMirmikm^ fdi^i ^ 

^ , ^ ^™*** *P»ei can be iiaablui^ eotfae 

iiiulUni^ branch ia reduced to 4 waya. 

a T^««id the date tranateptooaaaormo^ 
the data cache n«iory 140 when the control p««aaor 

» itring it on every cycle, «i interrupt baa been provided. When the iktn tm^te* 
lodttie 120 18 denied eocea^ it atwta loeping oo the 

A tinwout under tWa condition CMiaari^ be teatedfer. Ifei 
Bite proeeaaor module 120 can bitemvt the control j 
Thia wfll take the data tramllBr proceaaor module 120 out of the i 
end tfaua let the data trmMAr proceaaor module 120 in. 



ua 



26 fat the preeenttr preferred embodiment, iaeaaentlaQT^ 

the IFU 240 or the control proceeatag module 110; which ia eitenai^ 



00 m 



310, in the preeentir preferred 
210 or the control proceeaii« module 110, whMi ia eitettai9e(7 
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micTo 



Mipport loglo cub Into 4 mt tgDr loa, IntemipU, *^»*^*^na i code 
bus ttDd oaosUni / next ■dftrnni 



Intwtmtf 

313 l» uMd to oteMl the muBber to & Tbo Imomspts no used Ar 

fornmnntotta i and to support debuggli^^ toota» 

faiternipt Mureoa m (In orte or bWMM prkrlly firM 

Clay Itfgfc fgrf ffiTihWTtnl' Thk intamipt level k thared btww 
10 thod«w logic aodttohreal^ logic. TV toctioo. of theoe two mtom^ 

abovo, iD eooaactkn with tho Ametka of the Mquanoer 210 la tho coutnl 



16 



tlM debiig monitor and ilMuM iMt bo iMd 4iv<^ 

VMB ham fei>-->^. 1)1^ Bt«TU|* level to 
is stored ia the ooomend regtoter. 

rnntrnl HirnriHlll fliiumminill Ihto providM the aonua method 
ivfaereby the eontrol proeeaeor mo^ 110 cn lolbm the <tote 
UOUwttberetoecommaiid toi thedtp.eonnndFIFO. *• 
^ fllta FIFfh '~T ^tmirrt Imrl to iniiil hi nmii> Qu 

• module 120 that tlM dete FIFO In the VMB j 
they heve received eome dite (the input FIFO) or have nm out oT dite (the 
output FIFO). * 

qgtetgfiWWTli><aPiPterTupt togeiwfetedby the 



Thto interrupt level to uaed to notify the date 
one oT the FIFOi in the ditfa pipe 




do fayeq^oftfae 



cento (e.^ bulk 
Note: The 



Thto mtemtpt to reeenred fbr use 
csfd or network). « 

with * oen atoo be teeted by the 
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nonud eonditta eode logkti bo that thej esa be polled If tlutt ii preferred. 

For the iiitemipta merkod the ^tuatioi that fenmtes the 
toternqit eoDdition can be detected faj enmmhv 

FIFO data etructurea. This can be poOed if intemipu are not to be used. 

Tlieuaeaf multipleser 313 to eipaiid the number of Interrupta foreea dUferent 
thnmg requiremeitto between the fiyiir higher priorltgr Internet leveb and the four low 
priority fatterrupta. For the higher prkarily Interrupta to be reoo^itaed. they im^ ooro 
26 Q8 betee the rUng edge of the mkrooode dock. For the tower prlorily intern^ 
the deadttne ia 15 na before the fidttog edge. 

To generate an interrupi the corte HMM i JIiig fattem^t input ia held high for one 

dockpoiod No hardware interrupi a^ttowlec^ cTde ia neooMey. flo the j 
hardware ia very i 



Tte eeqoeneer haa a single midit ion code input caDed FLAG eoaB thai 
d0Mla are nniltipleaed faiAo thla pi& to regliteral fatten^ and Iva the 
normal eat up thne of 10 na when IBO ii marttd (oounter underflow interrupt) or 20 m 
wtken en^M. The poteity of the FLAO cm be chai^ h^de the i 

The inv oBodide^ like the CP owdula, ooniahM odmie reglater ^ 
afaowa in FlpM 40A. (Tliia anraUa proUema with retimii« hiterrnpt lHndlfa«^ 
The teatahle statw A^^mle m 

ZPU 340 coaditiOQ ooda output (COND): tl^ n^ml the irtatna of 

the current hiatnietloo. Hie apeetfk oomfitkm that the IFU 340 ou^nita oo thfti pin ia 
coded in the mierooode inatnietka. 

Microeode loop. Thii ia a atatua Ut in the Vim inter&ce oootrtd regte 
and ia uaaAil for i 



FIFO atatuaelgDalate the fbIk)wingFIFCb:Diitai^ ChalfftiB 
and emp«y>; Data idpe input #2 (half lUl and floqity); Data pipe out^ M>ata 
pipe ou^ #2 (ftiiD ^VMBdatninputChaif All and enqity); VMS date ou^iut (half 
ftiB and ea^ty); GIF interfhoe (input) (half AiO and env^); GIF interfile (ou^wt) (fblL 
half lUa and empty). Signaia marlced « come from the receiviag FIFOa on another 
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tnterftoe 160 to 
modulft's — ymp^ 310. 

OBCAbustcror. Ate fUtui bit acth* wim U» DBCA trv^ oa 
thtVMBbu.giU«l»rt6da.«««ihofabtt«efwoo^^ TUt moit HMf mmbo 
te tfaJt «rar to UMt 



Ite OQND tem tbt IFU 340 to Y«lid too toto to sMt tbB I 

lip tto» (oipectol^ « It wa be <leto3F«l by a m«^^ 
rUktonoditka thoctekwflliMdtoboi 




ItolSSoiL 

: *fbr loopr to boit (koa by QifaV 00a of tfao ooiatm kMi^ to 
^ftoaiDgtliaIPU840ftraaihaaacakiitotfaaaL Obfloui|r tfa^y eaa bo 
uriOf tlioIPITa40^ but wtth tha«itmov«tfaead«ra 

Tbacontttoa eodaa ato ndfe(ptond» vto an 94 to X 
iDpiit la aaqaeaBar 8ia Tlio 
ortha 



81^ into tha 
tha 




I boa 311 ca& ba diivaa fraoi two aoureaa: tem 
d froea tfaa VBffi bi» wban toad 
t cf tha nkroaddsaaa boa 311, to aaa 
llria to dona Mgra^rawito^ to tha aaquean 
rdtognoatiea. Tha aitamtooortfatobii^ item aa Una 311BI to 



310 
Tlia 
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to the HiMk DBterOm Logic 160. 



Thi. i. u«ad In o feahtoo <iuite <afl^^ 
with aequeneer 210. In the lyiP modute lao, mo hm^^ 
an enhsneod multiway hrmncUng eapftbOlty. TOi logte (and H. i»a In multhn^ 
br anrhini ^ win now be deacrlbed. 

Tb» wnhndhfwmr shown in Ftgivea 3A and aB inchidea aone irffttw..»i 

te ouittiwa^ fannefaing in mkroeodad vf^team. FlgM 80 ■»>ui->>*HtH r 
tho ndcroooda oiwatto uaed in the f«aant^ 
muWwaj bnBehSDg without addraa boundavy ^^.nttraln^a 

In FtgrnaA, note thai the coBstantAiait adchwfi^ 

flekD la not QD^ pcovidad to faidfer 317. but ate pmidad m an inpuft to : 

bmefa logie 31& The multhr^ faniKh kgle can *^^-Ht r tUa |n «^ 
tur^;j» a novel capability in ml«coded oicfaltectiffaa. Other mpau to tl^ midthi^ 
farandt logie bdiMle FIF statue aiffiala. and alM ^ 

VBI7 the m c rnm e tit between attefnathe '^^"^Whni a , in the »»U!Hnf faceneh «tep 
I by ewiueneer 310). 
Figure aB afaowe aooewhai neater detafl. The eonaCent field (16 bite) ftwi the 
buB 311 ia splits to provide Inpoto both to PAL 316 and biAr 317. A 
ooBunoo enable aignal la uaed to acth«te beth or theee, when imi^^ 
ladeahed. (Of courae, the aeqwncer hue 316 baa maiv eUw ueea ea well, and n>^^ 
bnnch operation wiD often not be deaired.) BCoreover. the ooMtant/oext addrefla fidda 
alaouaedTefy eften for simple juB^ opmtftooi^ and In such caaee the nudtiwiV branch 
logie 818 la diMbled. 

30 ahowe stiQ peater detafl regmfii^ the internal operatian oT the 
amhlway branch logle 31& A variety ci eootftte and statue tipuds are providad to 
ceiiditi iM aelaet/encode logie 3010. Thb seleeta and " ftr^^ ^ h f ^ .^ww ^^ ^ j^ ^ ^ ^ 
three bit aipial idikfa can be wed for I 
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Tbm maUmty bnaeh loglo la wmtwOeJ bjr tevwal 
<lMcribMlia deufl below. 

tt it patkutar^ advutagBoua to un audi BaOOm^ bauddiig logle fa • date 
tfurfer rrr iiiaai a Hke aoduto m b tbia eaaa, devka eenfitka i^iab cu ba uaad aa 
tha ccodMona toput to a wVr»/iin ccda loglo 3010. TUm p«^. - .h.*. t-^wwy ^^^i tmn 
hl^M oootm of«r a quite ooBples iateiAea. Whoa a atetiia tt 
«W tl» nwWwa, b«rt 10,10 can ^ t«i^ to tha app^^ 



routtea far haadiBg tha eowatton. Tha miMteiv bt««h e«|(aW»y parrtte lha 
«q»«o.rtoteatthaooi»dittaiaof..««ldorto««taa*i0a^ Wa aaana thai th. 
modaia lao ean parfbtm a date tfaote CO aMiy <9el.. 

' ^ * "ioilB eomBUan eede (FIAQ) liyiii to ant^osnplK 

tothapw-ntjirpcto^i wnhnrt l n tha fapute to oalort/tooad. aoio 

Mtea Wte ft«a fbor aa dateflad bekw. BoiMvar. or< 

iwiatr af othar iapiit tii r ai m m iiiila "tM banaad. 

NotethaaaaUftlpputiapaovidadtothaahttaadMgakglB. Thh| 

m tha da a rtn a rt ona of tha midUwtf bfch to ba waiad. 
Tha riiht Ida of ?yg^ ao oh-ia .rhm.lli ,Hy tha> tha teqwr MO baa 
[capaMB^. TUa oapiMtty; fa oaadtetka with tha ndtf«^ bnaeh 
boMPdaiy rnmif alma caa ba igawd. lUi ia partieiiki^ 
iaadatetmaaibrpnoateor. Sta- ««h a pnoaaaor ba abl. te 
. . bitfi ft«tloo of date tnaalte^ it ba doalwd to inchrf. . la,,i Jha^ 

^ "n-*"* b««l.i,o,«„„tete««,U«. 

Mgb pwpottiflD of aadi tait ni etlcn a caa ba naad. 

Tha pfaaaatjy pwftfiwIaaUwlnMm uaoa tha p«v« co«n^ 
joapdeattetta. i» ft« -aar fo^taoartatf^ ^ ..^^ 



tha baaa da^faaHoo a ddrf waa wqtpBad float a atftwat 
'»aWOBiain«la«p«ia«thao*«oodaaiLawhilaa. ThaaapM^SKt, 



veniona of tha oodula «« '°*^i>>S 
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tteIPUa40«iid 
QOQ-reglstarod 

WCS iatcrftee 
regbtm 223 nd 229 



310 hm their own InteriMl piptiioe raglitan «^ 



322 and 323 Amette aaalogoua^ to tbo WCS interOM 
d^MrEbod ttbovt* with r egwd to tho oo ntf ol 



Aa fanportnt ftmettaoftheDTP 
costroamii In th» VBfB infeMte 160 aad 
welL TUi flmctte wiA be dneribed In 



120 !■ 
In coo or 



ODO or moM DICA 



10 




TP Bua Daeodo 

Tfaii logle (wfakfa li ooo or tho oMai Inportm Amctkoo of tho deeodar 360) 
deoodw tho odTOodo TD soiBoo llBl* on^ 

dmwkm U»t en driw tho TD boo m It atoo deoodn tho TD CMd (m 

qpiaBtted hf a writo 0ifee Btffml from tho dock fenoioAw StO) to geoomto otnbo and 
^oafalo rignak. liCoi* oTtfae porta oo the TD bia 123 cm bo nad and wrttton. so 
oafaBlB riglitcffooa tfaoIPUaronat requM. 

BoMftIo TD buo oo urca o and <lwtlnatfc M Induda: XFU 340; VBCB Ihtccteo 
BlonoiT; Coaanand momotT; Date cadio OMmoiy hokSuf f^gistero B6QB; Afodo rector 
(8 bita); Soqyoneer data port CMaat / noifc addroio AM (aouno <mfy) ••; VMB 
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dat* FDO; Drt. pip. i; Dal. pip. a; Off FIFO ••; loMrrupt vMtor ragliter (8 Uu); 

DMA eoittral n^t^r. DBCA oeatniler - addN« oounta> •; DMA amtnan ■ »»d 

comur: S«woe. ««tod • « daeoded by tha DMA ooirt««» iMtro^ 

a. part of the nomalTDbu. control field. Soursea nwtol - oa^ drfve the km*, le 

Wtfc Wh«<«ofthemla«laetad.thei^iefo atteMlPALaieieataeactlwttad.- 
that the r^- 



1 ia ehher sign or xero ext«aded up to the biM width of 32 bite. 

Note that oi4r ooa eeuic. and ooe da«lto«loo CM ■eleet.l, and thv mim 
T»» t«a-to of data Into th. IPU 840 le under 

" ^""^^ •» •* «a U*a data ftom the TD b»M laa at the Mine time it to 
laeded into i 



Ae noted abom, the dock genentor an predueee the heeie eleck 
the data trwiate pneeeeor module laOi 



Moot rfth. debug har*w Induded Jn the « innmr b™---. 



In wious ptacea in tlia ^ ^^ \ lr^ i 
> for coiiMauieuce: Kbit coatyol of aiernaAb— boa 3U - bothi 

atlwlfaki, HifJwM^ hrwahiniinl mipport with no rwrti li lliai on th^ tmmh f <^ 

that aca iot at aoy ooa tima; Ctew lo^ wUch pernrfta ^ 
without rfaglo atoppiag tha Bio^ 

^ Wtoff microeoda; AO Intamipta can be aeleetmtr enabled or 

to the internal state of the critkal bgfe poupa to allow the complete 
» Mre and reatora of the DTP module'e hardware statea. 



The ttiaooode word temat la BBneraQr ahom in Figura dC; and la 

Itema maAed with a • come direet^ ftom the WCS, and are pftwiined i 
in the devieaa they are coatrolling. 
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irf ThiPiml PInil flntfl— lAL 

ItM total Qinabar of miffoooda UU mOftbte are 9a Moat of the UU art uaed, 
but thara are a fym aparaa tbat banra not been iq following f M ^. 

lEIIiZBfiSdfiim^ TUa ffeki eoci^ 
togkalopenitioQ of thalPU 340 integer procem Tbe aOocation of btta wiUun tha field 
ia encoded end detaila wfll be found In the Weitek data iheeta. All inatruetlotta are 
encoded in the bottom 24 bita and the top 8 Uta are on|r uaed during the feranato oT 
daU into the CPU 340*8 regtoter file. 

Sequencer ei«A (7\ ♦ Thi> ium ......^^ ^p^yimtl^m of the usal 

addreaa bgr the ADSP 1401. See data sheet ftar the j^t^^tiim set 

rmtmX ! nrirt mwm M\\ m ' TUa Seld ia naln^ med ftar 
proiridfaig addreea Information to the eeqaenoer butcanalaobeuaedtoiaaeaal6bli 
oonatant ^rahie onto the daU bua. TUa oan then be londed Into ai^ of the regtatera on 
tUabML 

^uHhmmm ^ * -^Wh mi nf FITO etatw 

iiVMii are to be uaed dnring a nultiway brandk ^y^^i.^. The mb 
VMB input FIFO; GIF input FIFO; Data Pipe 1 input FIFO; and 
I>ata Pipe 2 'mptA FIFO. 

Multhwir Mtk f«M^ ffl >PM, ^h,^ mntrtnuy branch atatis 

inforantknteinaerted from fait poeitkAO. bit poittkm^ Dm 

iratioua riM bctora allow fbr e«d& entry pdnt withto tarancfa to be 1, 2 

4 inatr u c ti co a loi« reepaetiv^. 

Multi^ h>ii«rfi fepyiMfW ^ TM, >^M^ ^ .tt^^Ki^ ^ 

cache memory eooem granted dgnal from beii« combined with the FIFO ftatm. When 
it ia not und the multiive^ branch ia 4-wqr and whao it b uaed it la d-«^. 

grrff Ifflflrth (a lUi field aelecU the qpeie length apprt^vkte to the 
inatniotiod and data routing adected. 

TfftiW fT> * ^ '-^ ^ timnrrr nn ctrnjoi iu Um ilaU 

oadM meflBoiy ia required br the data tmfer jmrnnaui module 120. 

Data cache writa mmtit, fi> tm» > — ^ in tho daU 

if aooeea to the data cache memory haa been granted. 

t Tlda bit Of er r M e a the normal write enable fftli^ 
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thM dtoi,. updtttag of wofdi In tta dito c-a» 

pooted. lUaiiunAdwlMBsettivbloekiaraamaiy toaeaiHUiitvaM. 

TTtmtitim fflh wtwt ta tM» add ■<i>eu of tha foan»inff >«w«*vm 

oodutabctMtedbjttenquoooer during aeaoditioial haMniGtian: IPU 840 condWoa 
cod, •«»,«; nto^XKl. loop; -rtUfliBi (a «««rt;D«.p|^ q^^^ 
Ikrt. pjp. Input PIFO#a (half ft^ 

tf-D; pipe outp«t PlFO#a (lUB: VMB drt. Input FIFO 

<wtp«t nro ftia «rf «Bp.y): OIP talartta. Onpot) (hnlf (ta^ 
(«alpiit) (ftHL hiif Ad. eap^r); Ditn eneh* aonofy ^ , 
ipniioo intarfhee oon&llaa eode M^mk WA bw amr, H«M 

"rtm fit ut pNvwi* th* i^^tk^ of th» 

thto adniie Ut fiOm thn 
thftodniB Utisp 
to bo oorrectty restored wfaon the 

ID^UUBUBBill TUi Odd I 
to M«* thnID biM: IFUSM; ' 

flai« SwpHocar date poet; ConalMl / 
1; OMn fip» * <aP FIFO! iBtamipt 





OUA oooMtar • mMm eontM. » 

1 nil floU seleets on* of tte 
buOim or dMieea an tb* daotfanUoo of the ^ on the ID buK 

VUBlntarteeaamaqf: D«*. eacba oian«, Wdlng wgiatar. Bloda 

Saquaaear date pott; Conatant/ aokadikaaa Odd; S«|naneer data port; VMB date FIFO 
Date p«pn 1; Data pip. 2; dP FIFO; Intamvt ««tor ngiatar (8 fata); DMA 
I mCA eontMlbr . Mhheaa coontar. OUA I 



t -— •rL-— nwi^mr • omoM eoontat; miA wwitroOer - word oountar. 

Tl» iro 340 la not Indudod in thla Irt, b«c«iaa It «an -gntt- th* data on tha 11) 
■Utiaa. TUaflncttaiaeentKBadfaytlMlPUinitraetionBeid. 
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of Um 



E Om or tiMM control bita ( 

> and tho othtr bH enahte MAvk the I 
and debug aunitar. 

BrrttkfioHit (1) (** Debug u— tmh, *• ) *V 4r!rty monitor topleee 
afafeakpoiniooaniDalruette. TWi cmiaea an Interrupt to oecur durtaff the inatr^^ 
-0 that control i. peeaed to the debug inonltor mie^ ^ 

Ckw fl> IVhiig umm ^ a«t to f,*^ ^ 1^ 
eoaertailL Tlda emieea an tntemipt to 
10 thnioanMiapMBedbacktotbe 

Thia aOowe a 
: the doelBa on and oO: 

110 ad one qC three levetaL The lewto m 
16 ft«n hoal reeeived; Date tm^br Uil 

reglMr or deviee la reed. le thta cm it aalacte wbafther the ^tn to to be 
mm IMl eat to aero) or lign extended (bite 15> 31 aet to the me ee bit 1«>. 

i >MA ContffiHii^ hlrtra ahtt ii ) flrW rmtfmla ttm liliia ihaM m u>, 
the inauuLtiune evefl e b ia ere pwwe i uod with re^fli^ m 
re- biUielli i ng the reg|Mr% nd the nomi4 VMA 
(or decrementtatf the ed*eaa end deerenenthv the word 




aO DMA 



the 

of 



26 



00 



oe extendi the 
telogteon the 
iboerd. t^rpieeluBaeoftyamiilit beto 
^ Interiboe card. 

bua ittterflMe la eleetrieeQf end 
I <m the GIP ao thay can ahare i 
conneetor k a 90 DIN 



off*board. Tliia 
boer^ or to 
toabuOc 



identkid to the 
theo^psala oait\ 
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Ttir^ rfto ctato [Note 1]; Wpdta 

[31; TD but 122 (32 bita) [3]; HaMk; WCS outpitt efuOito; P^ellM rtgb^ 

WC8 write eoftbW 324; PipellBtt regliter mods eontrol; Smkl clock, SeHal daU la, ud 

Srtal d«te out (used for mkrooodt kmdba^i BitenMl intemipl (4]; Interrupt 

ackncmiodKVKiCooditioaoo^ Att ilM m m kv«te aaeept vim aottd: [1] 

ThMilffMliMdiffmBtlidECLlovelt. (2] ri^ials tttM ECL Isvtk 

(3] '»»bu.ii3abtow4da,butc«ibecoorider«lforto» 

bu«--ctBidtlieprinMtty d«tebui«ndth#«acood«7^ [4] Tba* Hgoate m 
Mveo by opoQ eoOsctor buflbn. 

TteuMoTMOM BGL ri^Mk in this loteffflM is laeAd Id i^oniiiM *titi 




oCtlMi 
> tbftaodulB 130 wffl 
(or TP ndihiltO. Bomw, tlite i 
for other dMte ^rpe% M a < 

•ritfaaetae. TtmforOp Ukia modute wiQ 
i a nuierfe proeMor modute isa 

tbol 

^coupted toCbo< 

randlhtoKfaamesMy. ndoeontnl/lotateA logletei 
dock tlw ooBtrol rrocBMoi. ud te prefonb^f 

la tho 

Mbdttte ISO MKl the coatroVfaiter&eo logte 
' on m aepmte oubbowd, which ph^ into tho 
140 and the aehi port oC the cootnl 
IfeddUMial ttodulee 130 ere uwd, eech off 

la the presem i^pikaBtte. Ude eoattoVlB^ 
of the cootrol processor nodute lia However, wheth* 
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M pwt of moduto 110, Um 
reg Mf dIng the timing and oootnl ctencterMa this logle. 
Tlw pnMBl^ prefai'ied wnhodlnwirK hms a 32 bil data structure. ^-^^ 
point number la repreaented by 33 Uta» and tlMrelbre 33.ba untta are referred to m 
Ooetinrpolot words /-r T_-rnTrrfi'> Tn thn praetnltr laulbiiaJ ^^^^^'W-^ tt the 
fiarmai ia 34 bta mantlaaa and 8 btta eapooent. TUi 
temaft or DBC tonat 

The Internal operatloa of tlM Ooatiog point 
Tbm ftaturea of the tntartee to the 
140 win than be rtlirumuj in 

4A through 4D riww key povtioM flf the 

^ ^ I ilti- mrftmri rnnhnitlimin 

410 which k wed to htterftne to the 
acane key porttea of the data path b the 

in r»--| i^TT rnftrrni rtn^nltni^lnl, &r 

the mieroinatructkQ format uaed ia the aoata^^ 



be, aeleetabtf, either lEEB 



130 wa tot be 
110 and to the 



laoi 



110.1Vm4B 
4Cihowe thekfie 



4D 





^ path of the pteeent^ preferred enteSment to quite 
. nii path inehadea A floatiiv point »ultipttar» n fkn^ 
point AUJ (artthmetie and logfe unit), and fi^t mutt%)ort regtoter fife% aU cntroIM lay 

to the inner ditapatha. to hold lookup t 
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tbm topoleiy or tlM Ie«Haf««t <IM* li MB aoM dMr^ la Figur* IB. ttab 
path. aDd hM rwnpwwmtt . wffl be nOnod to n tto Floatine-Patat 
Arithmade Unit (FPU). The FPU iaetudea tot n^Mr file 480, multipaw 440, ALU 460, 
1810. and bad buaeae 431. 483, 488, and 484. 



OoatiBciMlnt oeleutitiao unto uaed ia the Ooaib^poliit 

lao are Uw floettariMiDt oudUpliar (FUFV) 440 wd flott^-potat ■flihmelle aod lo^ 
iiaik (FiU«) 4«a Boa p«u l»e ^ aterihr ta»e«tf a««tec^ 
in their dite hndliit (ffft from the dtStarait vUneliB >t^»«~i> b the eUn 
[path In the FALU 450 te -~- — npniniiiiia. 

The pneeM^ p>«tend rnnhndniuut mm hitavtted dradti (fimOat pckaa tUp 
gee) ft— Wpohr faulted T «rt-n l n i> u . (BTO. » ftfc^ TH. m 



pert nrnnlw re, and the ecpdMalaat uvina a ua tt ai^ me lfidil|iMiii 44ife BUIO 

or ADSPniO; ALU 46ft aauo or AD8P7UD! Ri^ Fie 480: Ba»0 • iUJaPTSML 

T1» OBdItpliar 440 and ALU 460^ and the tat le^tar illae 4S0k ettuel^ iM BCL M 
toteni-^.Hbw«r. their to»eAe..aodp«,er «.p^ «• TO. ' 
hOT*aftdl644iitdBtapathinterM«r, «ith3S-tlt. 
cUpe hM the capaUtty to da tafid e44it > 



ne FMPY 440 and FAUX 460 each have t«D 88 bit wide iapnt poru X aad T 
^^Jeoaoaeted to leal opmid buaeoe 481 aad 488 teapeeli*^). «Ml a 38 bit 
«Me MttaetlaMl poet T fiir foaulta (eooMctad to the loeal reeulta hue 438). Eaift oTtha 
he oeleidatiea unto cooteiBa a ktdi and nuWplaM. and the output port 
I • Butt^teer. ao 64 bit wide aiBBbere OD bo tnaafaicd ia or out. 
The laeolt pocu «r the two olculBtioa oaita are eooaaeted ia panlW (to I 
boa 488, ead thereby to wriio port 43QD of the n^tar eie). Thto paiaitB the < 



to Map dita witheot mua eAetaal iwiMn i tm— or routiaff ^ tlBai«h the 



a. TUa ia oeeAil, fbr aaaavlab whan auai orpreduetecakidatkaa no doaau TUa 
la abo uaaAd m pa i m i mug npid data trnfer to and fton the — -*»itH 

lAUlBDwever. a rf i ti ir i i ua oftMaoB nfl f iaaUiM iktlitlii^t |^ p^fY4Wflnd 
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PALU 4fl0 ctt not b» acth^ •! Uie «na ttait (ewpt 
I the output porU are together. if tht 

t on the kiput «id% oiaoe both devioeo ahM the flame dste psth from 
the regieter fileo 480. 

input and output porU to be reglatered or tnueperaiL However, in the pteemi^ 
pretend emboataient thie cepehOtly ie not uMd» end eU the porU ere re^teted. Tbe 
nxternel deupetho end the function unit of both eele^^ unito «ie ett 64 bite wid% 
ood cen pertem both odogle predeioo (SP) end double pre^on (DP) reinitotlfWM 
10 The ftmetta unit in the PMPT 440 eupporte 4 erithmetie initnietkM. The 

I On nenoeecoode) te both I 



15 OMde 



root 



40 


» 


200 


300 


doo 


000 


40 


60 


46 





20 




30 



SP .> 32 bit 
SP < - 32 bit 
DP •> 32 fait 
DP <• 32 fait 
SP-> DP;DP.> SP; 



SP -> 64 bit 
unefpied; SP <-64 fait 
DP •> 64 fait 
DP <*64 bit 
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iBfcager: add (whh 0. 1, cmrfh wUnct (with .1, , 
or iBtti^Md); mln (afffiad or unlgMd); logtel; ihlft Ooglaa or artthoMCk); roCttias; aad 



6 



on tha rewdto bua 488. Since the flentotfam mA« Ain «^ ^ffi^ w— ^ tititlni ihnMl 

hui» dste en be reed direet^ fron thto oMnmry nii2^^ 
4601 or Bttgiiter Ole 430. 

The erlfkoM eouater 1611 

OD t«9o mode Ut% the j 

the mMr tm decw nt e u t the i 

to be ipedfied. CF^ «BiB|4e, cooliiiMtim 

^ be need te operetkm ee e ateefc.) The I 

13 laUperaitatheiDeniofy IdlOtol 

TheopntteoCthli 

with^ . 

the previous 

be reed out onto reMdts bue 438. ooe «Mple of mcfa m 

20 npm aiii m , the aemoiy 1310 li -"Tt^rMtint m ^ oome te reeidto hia 488L «id the 
ALU 430 ii comondad to reed an opareod ^ fren the reeulle bui^ while the 
440 le werUn^ When the awWpiir 440 finii^ 
ihiia 433. and the ALU reeda ia that vehie ae a aaeood operea^ 

dilm tile m onto the reaidts bue 43Si eMa the awmoty 1310 la oooiaMDda^ 
20 tlMti 




I cea be loedad Into muhipttv 440) 
Thii tMe aiao providee e veiy oomenkat aton«e fbr 
Thia la p a ilind a tO coprenient when 



One 



The r o g ia t i r fltea 430 form the , 
of the register fllea rune in p«rthd 



with tha date 
with the CP 



14a 
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I with tha FP holding ragiMm 430 through local tnaite bia 422 (( 



to Udbauiuual part 48QA (Fi» 10)). Hm other h«ik ruM vaOmaoutir with the FP 



10 



•Ml toterfiw. wJth opeiwid b««e. 481. 4aa (fo«» i»rte 4308 end 430O, 
bus 433 (read port 43QD). and loopbaA eoBiieetka 484 (writa pert 4S0B). 

« J^B««««»«"n«kVPortfciiiaofthad«»pathtathaBMdHl»iaaiu 
maia cadia bua 144 (whfch ii 2«e MU wide) to tataffccad tp . aerta of Ibur PP holdta 
«gl«« 4aa (TW holdiag ragtotm. «• aetudJr palra4 ao that * taad 
pmOeM hr a wttt* regiater. Ihua^ ther* ara eight hol^ riglilin 480, m tila 

"id^ to prwHde • hidbaetlonal 28*»* fatatfteaj Tlia «W« hoia« laghtaro 480 1 
aepaina enahia ii^ab 421. Hw^ thb hapk of wi^tM pamha tha aas bit 1 
bua 144 to ba aaahi^and lata the 84 bit wide tet Ngtatar flia 430. 

tMa OMMpleBDC to poftenad prtaMrib te ooat MMOBaL na tet i 
480 ara veqr oqpariva eUpa. tJri^ ftor < 



tha eoot of tha qyatem. Uoreow. M BM ba I 



W - iF%».8aMha(ba»pri»» 

» or UMoa da»fcaa to fwy rtgi i flr aiil («h»» to thafc wiy hi^ pfa eoint). 
t of thaoa paeki^ rather 1 



1 48(^ aad aot aare^ two. BMh of the I 

. ohipa to 18 bita wid^ ao tour of tim to poraael aia oBBd to ptotida a 84 bit 
> to tha loeal tnoCsr bua 42& (Neto ttat ttt tatsfim to ' 

aO tothapreaaat^preftrradimtinillm ■ tharaatof lB-^%i,.,..l.,-,Ttti<ftBm 

g n«* -.Lu.. ^ daepwTlMM^ tba644dt Hdahit -*^ 

to heal trmfcr bua 488 ra«p*a. fiw de^ to ha uaad to paiaaal (For ehrt^r. 
4Bahw wthawgtotaratoaalftot»a>>two3>-btowhtoaiefc'nitoha»»toAoirtheiwrd 
ad^eaa odd/lB««n itatua atniGtara teuaaed belov. Fignra 16 ataip^ abowa tha icgtotar 
M fl^•480a.•*lgl•fltoJIhthapw•«,t^yp„fc,,adbe-t■od^th.a.do*e.h«.be•n 

part noBBbar BOSIO from BTT. 

I would ba 250 bita wida^ to patmit a Bon difeet Intefftea 
to tha OMba bw 144. bitt thto would foqutoo a^iifiMtt added hatdmro oqMoaa. ^ 

80 (with lii n rtol ■ II eootrol logto), to OKritiplex tha 3504* intartea to abblb^'ul'di^ 
011*0 . 64 Mt wide port 430A. T1»a miilt^leria, «»d date toutiiw to controlled by t 
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•n^ ha m uM MM m JMm M ci thea» tOm «re pf«renfa^ am follows. (The 

to tiM nglMn 420, tlttoi«h U&m m 

EMh <)rUMr««lMermn 430 htt two trniipM^ 

Uie r«glMr Se 4aa IImm outputs 430B and 480C drtro loe«l opmnd '^jTiflT!'^ 
10 438. -r-. 



L of the Mgiiter fliM 430 bM an inpuft pott 480D wideh < 

to>thifdlDertdil>to4aaLwfaich wabstsft^t^^.! ^,«n«^Tii, 

bui k ccBOsctsd to tho outputs oT th» csba^ton t»if xoi ^ 

Baeh of tho rsglilsr ffles 480 hss snotlMr Input port 481* ^ 
W **««>I«rt480Bl^k»pb«kooonaetta484to*sto^ 

tht Wiltsr write port 48Qa Tl^ sBous <Wn to bs\ 

to enotbsr without banrfng to fo throi^ the ALU 400 or 
4M. thus two <9«lM of dd^^. ™. msens thet dstn cn be tnpi,^ 

mm ospefaOlt^ on be 

I in bsnttv sttbrauthMSL 
TlH«, the flveiMrt rsgte fite 480 osch heve two reed ports O 
t Bond a Slid one wmfMifcMisI port Xths fend ports CwdopsfMidi to the Fll^ 



PALU 450. «d the reoults writtsn bsok uoln. write port 430D (or. if 



^ deriredt write port 480B). The regtoter files en atote 128 F_ 

Hie dil% eddrao end write enables ftar write ports 4300 and 43QB (and the 
write pert of the bidbeetteal port 480A) aie Mgtatefed inters 
i write pulM ii autooMtkal^ geMfeted. 
The two read porta oen haeve their data paths ngletored or tatehed (both 



80 betheae«i>.andthe«ra«waaee«gi.teredorlrtd^ 



proeeaaor ooodiile 130 to to n^tar the liifrfiMue m theee am 
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<&ee4r ta& th* oakroeocto and to liold tte dito ktdM tnupttreaL The data is 
I tBftmaOr to tiM FMFY 440 tad FALU 40a 
Tlie Mgl^ flto ctt op«^ in • Nnita thwwih* mo^ 
> ue the sMBft. la this iDoda the writlett data 
aame <7cK but abeui 10 na later than a maal read operate. TUa la uaeAil tot 
r«curwfe «r aalar oalcuiBtkoa where ia ed»ai^^ 



I fbr the read port 480a read SMWt 4a0C and writa port 430D. 
ltagrflfiidaortheFPnikroooda.ThiBall0«o rt « ra Q£ rd (;ype i 
to be perAnaad wilhta the oooatndnta oT the pipeia^ 



the hlMr BttitHwted flMt reglMr ffle 430 it a k^r tfoaeafc hi pr^^ 
' itweea the ooatrol proBieetui laodide 110 

lao. Tbe ad^eaa ^aee of tfaia regtatv file to 



to eet ea a double bidte. M ai9 
of thb ff aalB tm > ffle ma rai Mi ■■ mi n mi i i to the 



lia m the othar beak am ^nc hn ai wM< y to the 
130. (The oporatiDaB wfaleh are rp^i umluumuM to the CP i 
ki detai beknr. Theee tjtmak nyiuhvmm opvatim be i 
profWag a traatitioQal ckicfc doaadn, which he^ hi pfOfidfa« a h^kteadwidth 



t to be aiade on one Me or the boiaidafy without aflectfav the Other aide. 
tWa deea infeerftoe providoa a migrarino path to (hater* or auMOb Inifgiaied 
Mt% aad heaoe providea lloattaf poiat device fa^Ay'^n^impft (The 
iaae or the haadihakhif bgle wa be deaoribed hi neater det^ below, with 

ft to Ffps« 28, where the iatetaetioa between the CP niodide aad the FP niodiile 
ladaaorOwL) 

tha% the register file 480 ia double buffand te the aoraial e»^^ 

r, unlike pm agroteaa auch aa that or ngne 18 aad 10^ thto ^^'^^ i^ 
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!■ not iBfltdbto. Tina, bolh tte eootnl prnriMm moduto 110 

P*** »Odute 130 CM MCMi ttV 

ai« 43a Ttie to that the not loetod ou^ 

ffraat admitastt, «a wffl be eeea below. 

SbM herdwere acceee to aot cut o£C «c»h eoeeee to the regtoUr fflee atiat (at 
leveO ipedQr the AiQ 7 faita of eddreei (AOrM). Where the double buflMag 

^ it being ueed, ootf flis biU of eddreee era KtueQF needed (to addrev e lo^^ 
wit hin the c urrent^eveikte 

fcy mocflfting the top eddreeo ba CO the l»r, A mode IndiMtee ho^ 
bit to to be modified. 

Iliue, the regtoter file eilrtiiimin ipedfied in the 

witnme ti ce ^f by haydwre> Thm double bi^farii^ to eomwOed ty > ' 

whieh defendnee which tMdT oT the ngleter file the I 



lao 



i ii not eootroBad diree4f llM aiooeed* BdAih bat it (ag^ (ty I 
lo^ oB<r mbm both Um eootnl praeanor moduto 110 

Th. doubl. b«a«ii, OM, p.rtMflBtog on th. top adi^. bit 
4B Am two IBM .Ue by itda^ to indieate dinUe-*^ 
I to pattitfaadng «a tha bottaat adAraaa bit (AO)J 

Oa addraaa a bila) ii acooeMad by a two Wt 
I oaa of tha ftOowiiv adikaa modaa: 

; lUa «8« tha addniB apadflad withoat aiv I 



: TUi » aeieM when tha autoatotie aofk double boSMw 
I it CMM tha nvat atvdftwt Ut oTtha addrcaa to be rapiacad by tha baak 
bit The contKd praeaator nadUla 110 rarfrtar IBa addiaaa would uaa the 1^ 

ortuaut 



™* ■Ho*" tba flcattaypoink praeeaaov nednla ISO Utjux.i 

the data OB tha other *le of the bauk. without hwlDg to ««p tha b«4. or uto phN-I 

I To keep the caleulatiott pi peW i i e fiiB when croering a gn^ ' iM M M fi^ fff f ffp 
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aeam to th« new dsta U needed (if tt hM been ifflported yet). However, there wUl 
nflcnaqf be a del^r due to pipeUning: the benke ten not be swapped over unta all the 
remtta for the current bank hacve been wriUen. TUa Mr\m node drcumventa that del^, 
since a read aocesa can be taken from the o^wsu book at the register fflee 430, before 
5 the beak swap ia actual^ performed. This la ammpUahed fay reptadag the moot 
siffuikant Ui of the addreaa with the iaveraa <tf the bank teloet btt. 

Figure 20 show. ganeraUr the logfe uaed to aocompaA 
ftr douhte buflfcitag. The right skle of thh Figure ahowa 

to the CP module 110, and the left skla showa the infterte to the remahider of FP 

10 module 130. Tta^ the data connectkaa CO the right aklewou^ 

(shown m Figure 160, and thenee to FP hoUiag regtotera 420 and bus 144. Ttm 
data crmnarHooa on the left side would wtespmtd to porta 4S0B^ Q JX and E (m ahown 
in Fli^ 16), and thence to multipfiar 440, FALU 4(MK etc. Tte eiMrrsa favuta on the 
right ride would correapond to data 0^ extracted from the micnfaHtri^^ 

16 frem WC8 e«t«oaioo 490 by the CP mkwaddieaa bi» 211A. The « 

left wouUeorreapoiMl to daufiekb estrwted from the n^aoinatruetkna eaOadupftmn 
FP WC3 470 by the FP mkroaddreas bm 473. (The register file 430 hat internal pipefine 
reglatera for the addreaa inputs and th e r c fase reoeme the mte n ai M tru ettoa fate 
1 1 ni wgiat ered.) 

20 Two addreaa mmttfirahnn fegie unite 2010 «e shown, tluy rmrnttalU 

eneeptthatthefrc on na ct iona toSELandaBl^baffafe f ei tMMJii TiMm^ 
FP attanpt to aoeeee the same addreaa in ksglml moda» the addreaa modOvtloQ 
operatlona of their r espe dh^ logie unita 2010 would reault hi oppoafte AB bit output 
addreaae^ which neat^ imp l wnsnte the douhle-faitfEer Amctkm. The rnldiiM logie unit 

25 alao reeehree the top bit CA0) of a seven*bit addrem taken fiw one of the CP or FP 
m i rr o r o d e Balda. U alao receivee a 2-bit mode sIpuL 

In tha actual hnptemantatioa of the preaenthr pr^erred emt 



I logic unite 2010 are uaed oa the FP aide (one cadi for parte 430B, 
480C; and 430D). 

M complementary bank aaleet Simula SEL and SEIrbar are prori^ 

aalaet logic 2020. These two signala are revaraed whenever both the FP module and CP 



BHaBt Artfcatinn of r>iiPonS Plaei a-s— . LtA 
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rtqoMted a bank mpw CHm lofki which 



thli k dewibedhi 



faitoberofvtbv 



In hnp i mmithi g tha iw fc fr f mwWIra Uon kigte 2010, 
ihown ia F)gur« 17» hM b#cn MUsd. ^'t^M'^ k0o aohw a 
OHgr tMAi^f ba adapttd for tM ia aiu9 

Mmv inttiiiftcfcuKa Ikli aliBe^ 
tha^ ICk Hdwq¥er> any rmriwii^ which fa doo» on th» 

fcba c^p mMt ba addad to chip's wli^ ttea. la ite 

tha adt doub>a>bulferii» miam teat d^aib-L Th^ ^^^a ^ ^ 
logle 20101 hi iwrniMifi^ tha aoat itgaiBanfe fail tha 
litha'Aflrbit) toi mp leme ttl tha lofka^ pl^^odi nd pnvlaw 
an CBtn 10 oa onto tha c^da tfaDa* Whn (m tha 
thacydatlmacaiibalaaathMiaOM^tkfatoa^ 

la adefitkxad logk; aa ihawa hi F%va 17, wm 
» ^ eitm 10 n from tha <7«te tlna (on oaaogr <9cl^ 

tha aatop tfana baa abaa4f baen paid te hi tha aarte <9dcu 
ifea and naw dita la aeoeMd in tha writafaia 
tha unraglitarMl mierocada fatU wil not ba ■tablt^ 1te«fa% tha 
to ba hmnad a^dn, unnaeaMri^. 

Tha logte dwwn hi Flgurw 17 holdi tha aaifiaad adifraw bila 
aapnrato Ngii*ar 1740. A ipadal mlTOoda fait (caiiad ^laaQl^^^ 
fl|p4op 1720 to control m i iltipl aaa r 1780) tiat tha oid A6 vahia (fed fancfc from 
1740) ba oaad, rather than tha adcrocoda dcrirad ana (Whan UBfa« a 
tha <Saaold.Aflr mfcroeoda fait can ba atitnraaHpaHy aat by the 
^ p roff auum r daant naad to wotiy ahout tUa optindation.) 
Iba muMp l anr ITaO ia mnfa l n a d hi tha mma PAL aa tha 
lagteb ao tUi noltipleser doaa not ittftvodnea 



hi 
tha 



to 
fchli 
Inthli 
>aa tha 
(WCS), 



hi a 
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^ tb* opmttoQ or tbe oumerk 
uiibm coatroi p r o c — m aodute XIO. Mm oT tUt 
but is rji nlPu Bwl tif tha mkmode cTUm eaafenl 



""i^ Bill M 
At rttwniaged «bovi, 
flo plroltod fay aa 
Jogfee ii pli^ria^f <m the FP 

module 110, end InteffiMe to tbe CD bm, Ttmn 
or Um cMdie bus interftee^ to mttH«» Uie tnu^ 

14a the FP boMiBc ngliten 420^ eiKl the NfiMr ^ 
of this intertee are: holdfa« regteten 4S0; data 
boa logte 21ia 



tofle; and lootf tnunte 



10 



The 



ootte 



Iff 



420 iodude tit^ 
Md iUe nd e write iide iMeli, 
with 2S0 bite oa the 
iUe. Ttie outpot 



to <Mfe the 64 bit toal tnnte bw 432 to the 
of tUa iBterftee wffl be dtaeiMaad hi 



(Thtee 

Hbut«i^64 
ooeortheftur 
4ao.abe 



20 




The 



2S 



>tethe 

48QL ^flthte e ifaigle tranifer ^dft theve i 
of Fjeoffdi that can be tnuHtered to or 
ao gnamted by a dedicated cloek» wfaUt ra 




in FIgUKe 21. TUa kgfe to 
420aiMltbefegtoterlBaa 
4 nkiar cyelei^ oorreapoQiBBg to the 4 piir* 
the reghter IBe. Tlieae nfaiar cydea ate 
'wy hi^ flpeed. 
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Tl» tmirfer eta* gwmtor 4U provide 
a omfor tnitfte i^ele. It ii triggered to run wbea both the CP eta* end a tren^ 
5 eneUe bit indiosto a -go' oondltloa 



the truMfor etadc genera it perttf. but noi entire^, MTBefaroooue to the CP 
ctafc generetor 250. A high-frequen^ BCL dmiit to eooneeted to a 70 UBs ^n^^ 

MBp^ tae|» looping llw^ oo WMy e4B» or the h^JhftwiueiKy clock, 
10 »»«*»ci«41«.ae«»th.».Vrt«»,mb.<let«udirttU».t^ 
tte fal^MkvqiMaqr dock. 

Whan th» ■» eaadtloB to datoetad. the ttorfhr dock 



tba owirequiav oecfllBtar iapnt. to Fndw* the 
eyote. Depen di iig oa the 
W b«twwn t«w and flw dock haete wO be B««iuoea dwM • 



<9cie 



9<lab 1h* 

te^*aee<r ^ ** The a«|iicBlMeeeaw to) 
Phi— ^7 .rft— -ti^iy .i .1^ ^Ynllim rn li l l ii i H Iim i. TheUtftapeed^ 

CM 480, «Udi docks the opantion oTthe cafadetkn: larila 440 and 480 (M wal 
ao -dat. parte 4aaA,4a0H;480Ci480Dt and 4808 «f the whaler Oe 480) k( 

bo*toi^teMfcfenefcAb«»tho»iBdww»theldBfcipeedBCLtoopofthe( 

4U. iU>oi« thai led»«i the anride-^whfch (with th.CT«fcn«dedo^ 
*^ttMiftr dock oenantor 4U on e naior tnorfkr <9de. Above that ie aho«ni the 

^ tnnd^docfc. iU»«th«iadioa«theCP»fc««d,dock.gana«t«llvCPdodk 
25 eanaratoraaOu 



the traa^ ctocfc genantor ki aAet providae an intanae<aate dock 1 
I the tronAr between the data e«he meBMi3rl40 (whidi la oontwaod *y 
*be CP dack genenter 280) end the hutr bank oT the Sagiatar FSe 480 (whkh la 
contNlad bgr the FP dock piMnalw 4810. 
ao Tbe dock bouadaiy between the FPU and the dale cadie memciy ia a veiy 

— boundanr. niia bouadaiy cwaeae not BMie^ a dock pbeae bonndaiy. but alae 
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a polaittoar dr«tfc 4lflfcr«ee In clock 

budwldth maom tUa boundnj la crttkiL 

The praeo^jr protend milw«11ine« bridgat tUa boiindafy ia two atapa. 

Tlia double huflWnf <rf Itegtear File iBter&ca pfovldea a ti^^ 
the FP clock doaam to the tranahtaial dock deavhi. 

Ogh-apeed ondlipiend tnuMte (froaa the outer bank of Sagtatar FOa 490 
into the FP bitoOce of the FP Boldbw Batfatara 420) oeeura fteUa the tranritioDal 

tVanatos from Holdiag BegiiUn 4S0 into CmbIm Memocy 140 
whoQr wHUn the CP dock ifatBttm. 



i IMiiiita re^ffdiog Uite dKk teqiwncj diff^^ 
Tbe r«lstioa between tbe BiDar tnaiA 
W qjde tine i. not eecfctantet Ae nota^ 
cooMO from dbuiactie op m rttepe. For ean^ in o vector nd4, two opemte nd oae 
reeult mwt be tf«irfefTed between the feiJ>^ 

the fegiiter ffl^ the two operandi wffl be rend out in penDel. end (pipe&ied with tU) 
therenlt wfil rin n i tfn eouejy be written bnek Into the regiitcr ffle. Tlui^ to the ^ 
eee^ two words muit be written into Begtater file 430 and ooe word mmi be i 
fcr fflK celcuktton cycle of the cel^^ 
bat» in BBMiy appiicetko% the enrer^ m» not be i 



.itiO] 

pwrfbrebtothefc the alaor c;yole diwtion, d^ 

cjdB. 9bmM preferebtf be in the nB«e at one-holf to one-tfaM tiaee the 

duratta oTacelculetion eidein themj.Howwter. the adva^^ 

uiing • trnndtionri dock domrin, cMi be largely obtainod even 
n ie not nwt. 

Note thet the edmrteee. of the trendt^ 

to -v-U— p Where Wgb-^eed numerie oilnii a Hfl n uniu are ueed. Ita dock inter&oe 

•rddteetiffe <to«c ri hed penniU auefa unite to be ieotated in their own clock dooen. ao 

theft their dock can be run at the maidnuBn poaAieL llrie tOMdi^ 

and is independent or the device techmOosies used For eianqde^ ^ 
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thm 



To katp tlM mmdw or eoQtrol 40Mfti <kfWB (1^ 

th» a P^woffda m tmiterod Arom Um holtt^ 
420 Into thft rw rtrt tu ' Om 480. Ite oted Ibr thli ia dwtf itonuMwliand tiy 

d» wifch eKii addrai raqiMoc 9 bite to iptdQr II; 1 
UtaafadftwiliiftnMtioo ia total 
Tteiopat osqM j _ 




itofBplMBent Mlldoiifelt 

TUi dutarmln M tte 

I 

rbteeHkbadsaBadbrs 

3 fate oTtfaa CA bM UL IT the 

LtbtF.wofd 

I a tniMftr cTcto to oeeur ia tha 

llOcgrda. 

Mgtllltn ffrtn ^-^"*T HOT t^tha 
I part ki tha tfooafer. 
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<Mro the 64 bit (ku buft. If tto tnmte dMta k tem tht f^Mu^^l^^Hm 
boUing KgiMn 430, Uwn me <tf thoM 

Hi44Iihi ngj^mr goup ctoefca fft) . Thm iM fbur dock miWc^ wfakh 
MquMioft Uraigh the iMittmi Ilia llOl. lOll, 0111 (m ' 
four eloek tuhlei aeloet tb« 64 bit ipoup 
two of tha 32 bit raglitm wffl ftetuaq^ bt MMblad to «^ 
(Kot« that tto itm poritta withfai tlM pgttam 1 
or pttttans depcadi on tbd 1 



on the 



bus 429. 





the 



■ 428, end vegiiter fliM 4801 
meturiof. Thet it eppeere to the 
boB 422 the regite fie 430 opeere to 



140, 

TlieTegliterffie4aO 
FPU to be 32 bite fvid«» but to the 
be 64 bite wide. 

The gee of m t wo w u J i w id 
428. with • trooite dock of (eabetftf«(f ) 
•Kdied to the eitfit regietero 42^ li ^ 
fn petnitttag the oee oT o etatie dock 
kraeturo geu built into the file 
tMi ii shown irhenietfc« Bf in Figure 461 tile 



file 430 to local tmte bw 
then Ibtir ari Boc^^ ^ftl e phoeoe befaw 
in mewfmitin g tnoite ipeed (nd 
\ n dde ceeet ii 



J3O4S09 



JsLOOsBL 



FP hoWttf rogiilm M hsiefaad ia oppoitt* dbwtl^ to evea and 

wideh ii od± Hm muU qT tto <ioiibl»-fpord tnute to tlMt ai^ «v«a P.wor^ which m 
tna^^ffr«d(WO» W2i W4aiidWa> wOl nnp totbtleftMe oTttensiMer flfe480. Hmm 
will UwnfiM ffi^ to flnren regte fife nitrtffw u aeealiy the FPU. Comtpoodfaig^. 
•qj odd P_wardi wfakh «re tnfMAmd wOl map to tho rlgh* oUo oC the ngtotar fife 430; 
and wa tharateo map to ev«n ragtofear ffia ■illrnHii m aoan bj tho FPU. 
Tha toaa aarioiM impttcatka or thto to tlwt if a aartoa of aattar^ 

frcmcadia aanwffy 140% ^ t^m^^^A^ — -ttit aiifrniiiM, Hinji mi;, Ljf 

of tha i rlfb to tha Mgtotar fife 480 can bo laad. 

Tte OMM toBportam run al rt wa i to to that if o Ai«fe p.wwd (te eampto) to 

tha data ooold end up at 



m th. a»a n M« to b. 



uro that thto dito WW aecaaaad Ov tha FPU) fh« tha I 
^ T1» fff aaa aHy pwferwl itmhwf l n ^ m ptofidea the qwr with flf^ . 

L Hia CP modtdo UO oaa Auffla data ia tha DCH. ao tfea i 
dfraaalaXKai 

Z T^aaate togto eacTfea a atafeiM Ut. tewh« whate tha tal 
f to an evan or odd addkaaa. Tha FP kgto CHI taat thto atata fait ] 
thto oohr wip p B aa faif ha aiaUu n fbr tha vaty tet tm^. 

8. Oouhfe writa ^afea could bo ttad» with ifefta vaHd ^ to ] 

or tha w«H adikaoa to ba IpMtad at aona painUi nat to; tha 
•etaal^ iMd hi Bagtotar FQaa 480; la tha praaant^ 
tadudaa two parity fait feealtaa ftr wmy rirtaaa faita of <feta. Sacsa tha 

parity ehaeUaft theaa extra bita ara awalfebte Ibr 
la partleuiBr, thay oatt be aaad to cany *data va&^ (fa«i atoi« with tha ^ta. 
fima tha holdhif regtotera 4S0 would write a pair oT F.a 
> hito the two worda OQ both aidaa of the 1 




4. A regtoter fait writtan fay the CP aodafe UO; caa be uaad to 
tha corrtnt word toUton odd/evea atatua. Ttia FP inodafe then teat thto nc 
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to do oQoattiooti bnoBfaM. 

5* Tte CP moduli 110 om efanfo tho FP pn^naak^ m a wi^ to 
intern the FP afeoffTeet wotd odii/«v«n iUtuo, br chtt^iog the •tart irtdnMo ia regtoter 

478. 

^ A further •lt««H*i.thrtda«ootedh«di^ 

•Mppfaag OQ tho Qir. lUa altenwtlve Is doI pretend, ^nee meh dMtteetod Wdwwe 
would add dekor to eveiy tranate (whether swapped or not). 

The control of the tfaaste doesa*t allow Qocraatigumtt iititiriMi, to be 
tna^teved witfafai one mtfor tnnste i^dei For cnmpl^ 

cyclee to transte WO and Wa teoi tbs hoUfaif rec^teta into the ] 
if Wl could be tr a nrff ii unl aa wen (efonif it h newg ueeA thiii> ^^.^] 
cydebi 

~~ tern the B sg l st sr FBe 480 to the ceche menoiy UO, 

aettered wsitea are beinc pevteoMd. Tbe jesteied 
in tfaia oaae is to partem writee to both sldaa oT the Bstfstsr FQe 
niak i% aa shown oi Flfore 4a the two (pl^ikal^ 
File portfaoa 430 and 480" can taft be enabled* eo t^ d^ wv^ttea in j 
488 la wttttan into both the erea and odd mtmOm. tKw A,p§Tn tfd ^ it 

out to Hokfinc BegiBtflra 42a it CB be written into aO eight or 1 
ClUa taKtloQ la activated hy the BRjMtJOk bit to \ 




Aa note^ a dock bacvtag at naiat teir beato te tiwate enaUe is iMd te the 
veenhokfiagreglMera 490 and register asa48QL In the preesnt^f pretered 
thia cloA can ectua^r b»e aa many aa ftn beato per mitfor transte 
Four of the theee beau activate raapectlve paira of the holdfa^ register banks^ and the 
liaae provldae eoaae oasrgto te p^eine rwnrhwiirts Ae preaent^ 'lymtn^ the 

oftUa clock are afaoat 30 na. Thsretere^ a aalflr ^de is about 100 ML (Of c 
faMO could bedangsdO 

Tto clock structure showa a sIpiifloBnt advanto^ of the double.^ 

used at the intettee from the eadie bi» 144 to tiie regtotera 48a 
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"" ' »•» t»»o lii pmte condiUaBal Jinpfc Hm tnw aililuM alaM 



tfcta, «M Odd! or tlM oMiMioii. 471 m fed ioto t«o 
"»* ^'S- Ifc^ " git tw patnto thi "trnar and "ftW Hfrwn to te 
tnA>^ M tlia«» depcotfDS upoa tte oiteaaw or a t«rt. OM or tiw oUmt m 
hMdadfcaekiaM the neat Mi mihiitwrtfau nMr iw 471 Tto thaw wtfrtewptwidt 
(■vid eoodMoail bnodi eap^UBl^. 

Note thai M addtttaat ngirtar 4» la provVM. ftr tha aderaeoda faita Uwtda 



478 providM MOM alffiiftnu addifck^ 



frtneertalalte<tfthft mkninafcnKtkn biM 471. m wfll be 

HofWBvw, tiM req uir cmaBU of • stack to um with the trigh ■pnart 
AKhitactUTO or tfa» FP flMdul» M iwimwrnt umtiiaaL TIm pMtntt 
rnnhnmiDiiM pro¥kl« « stack which ooi on|f provklea tho ninmij laft^fak-Ont^Mft 
(UFO) opantko at Ugh ipaad, but alao pravklaa addltk»al OodbiMty which b 
qwMftrdakaB»ag> Aehiawffag thtoflmctioMBtytw^^ 

eotwamtoort way to ImptettBot • ataek ftmctloo h^ 
eoaUa ad writa enafala a^fnala wwa ded toan 
(yaad) oparatioa dccremnfd the counter and evwy V^h' (write) 

lit 

tte ceoM portioD 3820 k ft aniltaewd pipdfaie fc^^ 
ppt ihim AMD CAMP 29620). ThfaMKiMrt^ 

laaotrtPiitmiilttol^eraftMirfyA^i .^.^^^ nf thn rnchlrin TOl fn 

(TlioaonBal mode of opawtkm of • davtoa of tWa tin» woofct bo 
FlFa or to prorida o flwd d^.> 

&i tho anbodfaaant diowQ, the cootMt a«dbtttiea of tfato r«glat^ 
of a PAL mVK to hnphBD a pt tho UFO opoMtkia tlio PAL roeein 
to pop or puah tho rtacfc. A rood.iteck taput ia wiao provided, ao that (primari^ 
ftr «i0BOi*ie«) thoatetoof the atackoan bo nod without ehongfav it.Por tiaeio thte 
OD oflbat kiput ia provided, which can be uaed to read out a stack level which la 
I to the top level 

Tbo output of tho rogl^ ia coimactod to tho 
kyut tothoitackrogiatof iapco><ddadhy thoflJaaaddrea^^forroaaoM wiH now 
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Stack register 478 provides a powerftil capefaQity for subroutlDe operatioo. The 
microcode instractioo which calls a subroutina wiU state the subroutine address in the 
true field, and the return address ia the fUse OskL A ehort field of this Instruetioo will 
also contahk a push ctwrnnand, so thai the stack register saves the "folse" address output 
6 Al the end ot the sulgoutine a pop mmmiiiirl wffl enable the stack register to output the 
return address ooto the microiDstructioii bus 47dL 

Tlttifl^ the four levels of the stack register 478 permit up to four levels of 
subfouftines to be p'ftwtfwl, 

10 oas^JismsataUasi 

the cyde thnes tot difteeni instructkna in the FBIFT and FALU are ^ffiereni. 
U wouki be useftd to taikMr the €yd» time a eeot din i^, to npHmitff the ralniiiitirm rate. 
The most Important diflfarenoe (15 ns) is between the FALU operatkms and the singia 
precnicQ multip^* 

15 Tbe fIdPy has aoine vary long kistructkMii such aa divide and sqw 

their executi on timea are 200 ns and 800 ns respective^. Two optlona are provided for 



Extend the cyde length by the appropriste I 
DisdUe the dock enableo to the FMP7 while the long iDstru^ 
20 proysss^ but keep the h isl ni etkm and data streams going to the FALU aft the normal 
data rate. This wiB aflow several PAUJ opcrstlons to be hidden under a <fivide operation^ 
widcb m^^i^ ^^^^^^^1^ sQ^tts iri^oritb^^ML 

The dock generato r p ro du oee two waveforms • the mi crocode dock and a write 
gate for the scratdipad m emory . The min i mum cyde length the dock generator produces 
25 is 21 tts> and this can be varied in 7 ns steps, up to a mavimtim cyde loxgth of 96 ns. 
In the jmsrnHy j f i ■ ' the practical cyde leogth is 28 na 

(dnee the WCS memory aoesss thne is the Bmitkig foctor). The cyde time FAU7 
opmrtkw ia 28 ni^ and 41 ns for p ce diian multlpljf opemtkw u k 

TbB cfc)ck gen«ator is implwmenf ed as an KCL state mftrhhrw runnigg with an 
30 taqpui fr e qu ency of 140 MHs to give the tindng r e so hitk m. The use of this BCL state 
in rombtnatifwi with TTL awpmncing logic and Mgh-^eed ralnilat l nn unita^ 



J.304509 



tuna out to be quite advanftaffBouaL CAa noted abonrep the regiater fflea 430 and the 
ratai la t l oa units 440 and 460 have ECL internali with TTL peripberala.) 

Tte dock generator can receive the fbOowing control inputa: a atop or start 
<^ffl"™nd may be received from the VMB interihoe (Le. from the host), or from the CP 
module 110; a length faiput field frm the microlnatructioo hua 471; a atratchh^Mit wffl 
a "wait stater (or longer qrda length when the CP forces the start address 
* to be the nleroeddresa source to the neift FP mkro^cle; and the breafc^^^ 
is alao cotmect ed to the clock generator, and H to stop instant^* 

Ai noted above^ thera ia abo a tranato dodt generator 412 in the CP Extension 
10 Logic TUB dock Is not related to the dock generator 280. (However, note that both oT 
theae dock generaton exploit the advantagea of BCL logps ki a dock generator 
«dikfa ia driving TTL logie partaO 

WssssSBiSssmsSim 

15 One of the notable features of operatioa of Om FP module 130 is the use of 

comprntedniicrocode. That is> some logte is provided at the 

permits a fidd of the mkrotnstnictioo te be replaoed on the by a previousf^ legisterad 



In the preaentt7 preferred embwfiment, the fidd which can be replaced ki thb 
20 fitthko is the operate ^adfisr. Hasraver, m other AQratema^ k wmdd be quite poadble to 
replace other mkrobatroctioa Sslda hi tUa fliddoiL 

Tfauiy for enmple^ Ibr eperationa which mapped two amora onto a third am^ 
(e« CI « Ai + BO, the inatraBtton reyater could be loaded with an "pfmtiftn apec^ 
(e.g.*ADDO bsftreaaequenoeor8iKfaoper«tlQoawaabegun.Theaeqi>enraQfopefatiQn^ 
25 would then be stated ki code which did not specify ttw opmtioa directly. 

TUa logle la diowa ki F^ura 4& An testruetkn regfater 4610 ia loaded wkh an 
> btts). Tfaia opaiate spedOer mrw^spnnriw to one of the fielda of the 
stored in WCS 470. 

^ to tlM Use.nr bit (^dikdi Is written into a register b7 the CP nndde 
30 110» and therefbre changes rdativebr infte<iuei^K PAL 4520 adects whether to enofale 

470B or Inatnietkm Register 4610. 
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If the "UM.nC Ut were assivied to a field In the mkrolttstructioiw U could 
change at every ^de. However, m thb ooae the extra deliQr ia ^m^h^ wfakfa apedfler 
to use (and thai enabling it) would increase the cgrde length on every cyetd where a 
change was msda. 

6 WCa 470 is sctuallv nhvrical^ milguFMl, tK^ r**^^'^ ^m^n^^ ^h^^^f^j^^ 

as 26 teteffrated drcutt meinortes^ ead& 4 bits wide. Thu^ 
store the 8 bits or the operate apedfler field. Theae two menmies ore show^ 
47QB; and the menories which store the other fi^ds at the WCS 470 are shown as 47aA. 
TbB inatmctkm register 4510 can be read or written from the CD bus 12% by 
10 an er i iVing it fin CP mfcapoendei^ — tha f!n htM mtmwwmm ^ A^^^Ux^^ 

Note abo that the PAL 4520 also recetrea another bet at input, so that its bypass 
operatioa esn be dfawihled during '"*^«'*^'M>4ft load ftp wtVni s 

U FIgare 29 acboniatkaQy shows how the WCS 470 izUerfiMM 

144. The 64-fait local bus 422, which cnnnscts the FP holfing registers 420 to port 4S0A 
of the register file 480, is also cooneeted to the serU shadow regiaters 481 wfakh hang 
Ott the microinstruetian data line 471. (Aa eitenaively Ja c ua s e d elaevrtiere^ thsse serial 
fflg li ter s faitettee the control store 470 to the setisl loop used to tranamit 

20 mkrooistmetiana flron the host) 

This adriftinMl mnnertino ia partirtilsi^ advantageous In the numerie p roce ss or 
iiwwiiil e 180^ rinfte it pefiiifts microcode overlays to be diangBd very nqiidhr* 

In the pr e sente r preferred wnNwttmwit ,, the serial shadow register 481 Is actusgy 
cottOgured as two pl^aieaVf aeparate registers 481A and 481& These rei^s^ 

25 provide a faUreetlooBl hiterfin to the data port of the control store 470^ but also can 
reeshredstnftom th0locslbus42a. As ment ksiedshovei,then n ^^ fisldsin 
tine CP Bgtmwksi logb contain bits» incficating the data <Vwtnwrtian of the Iwrt bus 422, 
wlkich can connaand this read* 

As notodabove^ each FP ndcrohistractioQ Is 104 bits wide. However, to conform 

30 to the sutomstir aliiMi« of data around the serial faiterfiM loop^ the ahift register 481 
baa been made 112 bits fat leogth. That la^ the number of miorofaistnictlon bits htm been 
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rouadfid up to the xuboA wtsk mnlt^ of 10, to deflno the length f£ the shift regtster at 
the interfooa In the preeent^ pre fe rre d esnbo^MBO^ regleter 481A is 64 bita wide^ and 
register 481B la 48 UU wide. 

After the regiatera 481 have been loaded with a mknlnatmetloQ (in two minar 
5 transfer cycles of the local transfer bus 422), they are driven to load the instrustian back 
into the WCS 470. Thla wffl reqite an address to be plaoed on the FP oilcroaddresa 
479, and will alao require a write enable slgatl to be transmitted to the WCS 47a 

In aeriai tondtag^ the boat uses the CP mi eroaddress register to hdd the 
adchesa of the FP WCS to load (or read), and routes this address to the FP WCa (Note 
10 that the Input from CP microaddress bus 211A Is fed Into FP mkroaddresa bus 473 by 
the buffer shown si the top of Figure 4C.) 

In the paraflei loading niode^ the eg placea the target addre ss in the start 

register 479. 

AddEltkoal logie is afao providad fer intetfeea to the host. TUa logic permits 
15 mfcroinstracttaa to be read from or written to the oontrol store 47a This ftmctkn wiQ 
be dlsriiHswl in greater detafl b^ow. 



When the FP module starts up^ it wig normally go Into a wait stat% because of 
the FFWAinCPWAIT handshattn g logic described bdow. To start a routine running in 
the FP module, osie bit of the CP micr ocode can toce the microinatniction address held 
ier 479 to ba used aa the naat miwronddrgss on the FP microinstructioa 
I bus 47a This art lew is <|nBfifled fay the module spWtlop» aa described bdow. 

At the faii^iest lev^ a floallni^poiot processor module 130 must be sslected before 
it can he coatreasd. In a w n^e-module caatflgnratkm» the FP module is setated aO the 
tims^ and soma of the following comnaiita doD*t app^* However, in a i in i lUg iie- modul a 
eonQguratioii, tfie dealred FP modula (or algocithm aa ce lcnat o r ) must be sdscted before 
80 it can becontrotted Several FP modulea can be aelected at onee^ to allow data or control 
to be b road cas t to a sufaaet of the FP moduiea. Tike FP modules can ba 
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15 



aeleoiad Jn oott of three wagra: a d-blt vahie previoua^ stored in a control regiater can be 
used; a m i croco de fleld can be uaed; or, leaa proferab^, the CP Logio portftooa 

410 on each of the different moduicfl; can run their oiwn streams of m i croco de in 
a^nchrooQr* ao thai aooeat arbitration can be performed In mkroeoda. TIm method that 
ia uaed can be changad oa a per qyde baataL 

Onee an FP module(») haa been ael a ct ed, the method of controlling it ia apBt 
betwwu oontrol r egiitera (loaded with kog term control inftcmatkm). and ^'^'Hmtwt 
microcode t»ta (br cyda bj cgrda control Moat of the qrde fay cycle control ia coooemed 
with data tranatea betMen the data cadie memoiy intcrfhoe and the register 
modular aa dcacribed below. 

The module arfcwrt ioo ia shovm arhnmatiraHy in Figure 28. Multiplaaar 2340 
aelacU which input to uae for module ID. Decode logic 2310 (wfaidh is part of the CP 
. logic in one of the modules 130 or 1300 teats the brond caat module nrtlrcM 
the switdi settfaisi in the particular module. QuaHfioation logie 2820 a coor dii^ 
quaifies a aide wie«y of mieroinatructioa fioidB fhn the kxad WCS extenakm 480^ aa 



20 



30 



Tbe 




the control atgnals that Influanoe the 
the data cache memocy inter&ce) into tbe 
are alao inchided to 
The con tr ol 

TraMfer start (S) Thia field spedfiaa wfaiefa of the 8 
the bolifing retfater poi9 is to be transferred first TUa can alao be 
the microcode fastructioa or autiHnwtieaHf based on the data cartie 
the data orf^bnited froau 

oecween we isiiiMig reyaHr ana sne ropHcr nnu sevween x ana 
estt be transferred. 



part of 
that 
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lOBSSaJSSSJXi Itib bit sdacta tbo tmnfer to be between the 
holdbg registera and the regiet« file (0)* or Qrom the holdiiig regittere to the FP 
module's WGS pipeBne register (1). This latter Atnetloii Is used during parallel loading 
of the FP*s WG8 oMMfy. 

reglstm to be copied direct^ into the read holdliv registers without having to 

into the register COe firsL The main use ef this Is te dfsgnosHr and state save and 



10 



This bit has no hardwired fti nrt l oi i and csn 
be tested to the FP^s smnsneer. Thfa aflpwa the control ptocessot nwdule 110 to ted the 
FP to do one cf two'operatkns witUn the routine it is cumntt^ ffswrutini^ For i 
this bit could be UMd to q^edQr that tte data at the even address (as oppo^ 
at the odd addresn^ is to be used Car the f Wdatkn 

Seloet Inetruetion Rerister m This Irit fbrcea the Instruction 
Register (see Ister) to be used katead of the mlprorode instruetkm field in the WCS to 
oontrel the operatkn oC the floirtlQg point ALU and ondtipeer. 

Mnrt Bmr it\ lUs bit hddblts «k FP error oonditioa 0» 

*- Vtt * -* ^ V ^ _ - - - ■ - J _ f ai > ■ ■ III I ■! i * *- - ■ ri ■! i ■ II 1 1 

oy tne rir nucrocooej trom generatsng an niemqie ss cne ccanroi ] 
UO*a sequencer. The FP emr status can stai be tested^ the ] 

ritomtWis the host wwaputer needs access to the FFs subroutine stack. When aooess to 
the stack hM been enabled tfais field is used to select wfakh stack entiy to read. Note 
2S thit tffts stack entry tlud Is necBSsad is rdsthe to the kscatloQ pokited to by the sta^ 



par^ttel nlcrocoda load 



:T1ie FP ootttrol rei^ster cdleete together the 
dodc control and 



30 Ctock coctaroi (2) This Seid aOows the control 

110 to eontrol the FP'sdocfcSL The FP docks can etther be running or stopped. The 
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Ui in the dock control field iras tiaed to aelact that the FP mioroooda dock usea the 
ndcrocode dock of the eontrel pr o cea a o r modulo 110, thua allowiiig the FP to nm 
synchronous^ to the conttol pr oce aa or module 110. 

^ mlrmrift wMrttn r>tii» m lUa add aUofwa the oootrd 
6 proceaaor module 110 to odaot that the akroeode eddreaa uaed by the FP ia one q£ 

FP aamimear trnttrnk TUa la the normal micro artrtreaa aeufce 
when the FP la running mferocoda. 

atari Addfaaa redater TUa aalecU the start eddreaa regbter 
during ponM mkrerede loeifittg. (A di£teent b uaed to adeet the atari 

10 ntkhrmm regkiar tha CP moAito tm t» tim hn^ fat i>i-tw%«>w«ig the FP module 130 

to atari running m fc r ocod e from a pertkutar eddreaa) 

*^Tft Tiittnit ^ *T ^'^^•'^ r — 

110 to gain aooeaa to the atdbroutine aiadk during ^^•g'w***^ and dei wi g gi ug, 

16 module 110 to aeleet on wfaidi eventa hi the FP it la to be ksternqAed. Theae even^ 

breakpoint. CFWATT, FPWAIT, register file 8wap» and FP error. Onoe an interrupt haa 
occurred the corr ftqwiidin g maak bit la te mporaiy daared to reset the interrupt reqiucat. 

PerelldmirrncodeloftdcopirdmTldafiddindadi 
to contrd the WCS write enable^ the WCS o«i^ enable^ and the A^no^ 
20 moda^ ctock and acrid data In ajgnalai Hie paraPd wlrmrede load ie cootroPad by the 
cootrd pcooeaaor module m aa hi daaedbed hi greater detafi bdow. 

^^jljjl^m^^^j^^^^jjI^^Q^^j^j^gmjjjj^m^ iSuooing throQ^h the floating 
point ALU and muMpBer la a aerid loop that can be uaed to gain access to the interxMi 
state or both ch^ and alao to load hi acme new atate mtematko. M the kitemd 
25 reglatera and flags can be aeeeaaad In tfala way. To central tfala aerid loop the contrd 
p ro cea aor modde HO hae three oootrd a||palR acrid modeb aerid detain and a aerid 
clorfc 4iie awnai cwck m uiiveu nireciy irom uaa repacer ois ana muss oe togpaa oy cna 
oootrd prooesBor modula 110 to ganeraie the rising and ^^"^g edgea recpilred* 

Start eddreaa reafater The o o otr d proceaaor mmfa lft 110 loads **** 
30 atarinddreearegjaterwith Ae e ddr eaa of tiw mi crocode roi^hia it wants the FP to < 
when ^le Jump atari addresa microcode bit Is used* Thia rf^ater la < 




during parallel mkrocoda to hold the addras of tho WGS tocation to load. 

InBtnietk»registor{aMta>Tha«^^ llOcaa 
override the floating point ALU and multiplier Instnietiaa firom the WC8 and lubetituto 
its own tnatruetkm. Hie Instfuetioa register 4910 (ahown hi Figure 40) holds tUs 
6 jnitmrtlnn Tlin tmnnflt nf tills Is thif the rnntrni pi mm mmhils llff mn nwtirnitin 

to a vevy large reduetioa in the aniount of WGS tttsd te v«7 aimilsr al^^ 
3lstsus8fettSL(9saasftjai^ 

mfc toeode dehuggiag to gain aocess to some Internal Infbmatlon in the FP modnle. tlie 
10 status that csn be anrpiwfrt lacfaidBB the register flte aiMross and holdii« regMer start 

Kcgr fields of the FP ndcrooode fSormat are genemQf ihown m 4gure 4D. Hie 
16 mkroeode wosd is defined more p re cis e^ below. The items marked with a * come <firect^ 
from the WC3 470, and use the internal pcpefine registers of the devkee they are 
controllings Tlie number of bits per fiidd is indicated in parentheses* 

ItaSLSdAcaaim This fidd hoUi the nasi address to juaqi to diirB« 
aomal iwgnmrti a l program eiiif i itin ii (Le. wsiliuue instruction), the address to jump to 
20 when a mndUhmftl test is trtie and the subroutine adareai for a jump sufaroutins 




TUs field holdB the neait sittrwi to jump to aten a 



oonditioBal test is fidse and tiie return address £br a jump subroutine Instru e ti on. 

Brill nllrrw TT (TTl " This nrlil hnlln ihn 1! hlie thnl nrririrr Ihn mTili mis 
26 in the reglstflr filsa where data is to be read from and piseed on the 7 port The 
plQrskal addrsM to beU In Tof the 9 bita and the other 2 bits sriect h^ 
to be ""^■'gft*^ Ttm options are no modiiVwUn n (pfa;yaicaDt and soft double buffering 
frither Io0osl or preview)* 

ao fat the register fltos vdtere data to to be read from and plaeed oo the nr port. The 
pbgratoal adtess to beU in 7 of the 9 bits and the other 2 bits sriect how the addreaa to 
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to be mftrtifimt TIm optkms ars ns modifleatifia (pfayaieaOt and soft douUe bofihring 
Qoi^bbI or previow). 

Write addreM (8> ♦ Thia flAt hokfa the S bltm thut ^piirffy thii «ddre«i in 
the register filet wbm daU It to be wvlttea t& The plqriM addreas la tield In 6 of the 
5 8 bH% aad tlMother 2 bite select horw theaddreaaia to be modified. The options are no 
mwUflratfcw i (jpligralcaD» aoll double buflMog Qogleal), or a^ double bufifertng (preview). 
Hie edfkeaa eelecta a pair of reglaters^ one at the even addraaa and (me at the odd 
nddreaa. The wrttfa^ oT the reglateKaO la coptrolled by two aeperate write enable bita. 
CIMb fbatore aDowa a reaolt to be d u pP cated in both die odd and even aides of the 
10 reglater Sa^ as d iicu aai d abovow) TUa ad*eas ia alao uaed fbr the loopbacfef write pert 
liHydk la need to dnpficate data in the rei^ateir fila» 

Even Write em ^^ft f ]> * Wu-. u ^.^u^ w 

even regiater file addreaai 

Odd Write epahl 7 ft) » wh^ u w 

15 odd regbter file addreaa. 

y>?rt*ry TrT* '^^ * ma Hpe r iiWwt the floating point or integer 
to do and la shared bgr both the FMFY 440 and the PALU 460. Ml detalb 
: the inatru e U oa aet and o p e od e a for the apedfie parte used can be Ibnnd in the 
<a dataaheeta. 

data aoiid the loaAng of the laa^nk and oatpnt re^^aters: X port nwrttlpleier emtrol (the 
TCpQgtf'ii the port ooonacted to the firat operand bus 481); BnabieX port register data 
load; Enable T port rc^^ater data load (the *Y port* la the port eonnaeted to the i 
operand bi» 4820; EMble Z register load (the "Z port* ia the port eonneeted to the I 
25 .bua48»« 

YtMl m mm ri*rh (ff) -ntMpi^y 

of data and the of the input and output re^totera: X port niultipleier control; Y 

portondt^taer ooiM; Enable X port register data load; liable T port regiatcr data 
kad; Enable Z regiater kad. 
30 Clock length (4> Dafinea the Inatnietiona cvde length, Theae range from 

28 na to 08 ns in steps of 7 ns. 



450 to drive the statue bus. 

^^'TlBthn 171^ ■^'^ Rf t Wim flf thii feBowrhty goiAriaiM to test: 
fotce tnie ((Mhah condttloa); FPWAIT; cany (PAUI); divide by lero (mPY); stkky 
5 rtaiui (dhridB-bghsera; stkkj statue aethm; CP optko Ut; X data vttUd; Y data valid; 
addreee luft data tnnsfemd QLe^ even or odd; nk rocode loop; sero; negstfare; interrupt 
fiBg; not a number (NAN); rounded iqn stic^ ov erfle w , stklgr underflow; stkky inesaet; 
sticiv bnrattd operi^kia; stfchy denOTmalised. The last ten of these may oHgjbMte from 
the FMFT 440 or FALU 45a 
10 Braakpotet m Set to in^eate that there ia a breakpoint set on this 

Set FPDONB m Seta the FPDONE status flag m the cottfrol p ro ces sor 
module 110 intertee to tefl the control processor module 110 that the cafculattona have 



15 Swan (1^ BaquesU that the soft doubte bu£for in the register file be 

swi^iped over. The swap doeaa*i happen until both the oootrol processor modiila 110 and 

ISO hswo requested the swap^ 

I ims neu oontrois uie operanon ot cno ecratciqiaii 
and fts address eounter. Odo bit ia the write mable itas the scratchpad i 
20 and the othsr two bita select the addreaa counter opsialfao out ot load; 
decrement; hold. 

Baaulta bus oatpat select CP This firid selects the source that drives the 
results iNts 43a Tlia possible aouroea are: FALU; FBSPY 440; Scratd^ad memoiy data; 




25 fltftrt ir(?nlf^ ^ etMAcontrid field controls the subroutine atnckk>gie 

so tliat tlie return addresaea are: pudksdt popped or hsl^ 

Timrnilrirlr enrfaia m TMs bit enrfjlsa a wrke cvcie in the register 
fito 4801 throcvh the kwptecic port 43QB. Tills copiea whatever data k on fi^ 
. bus 481 Into the address spedSed for the write port 4SQD. Tbe odd and even inite 
30 eoaUea setet which bank oC the register file 480 the data is written tQ» or whether it i^ 
I to both. 
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fltUTT ItfltMff «mtw^ (fl^ Thk field sefecta whether the atatiis genmied 
in thia <^cle is to be incorporated into the sticky status, the sticky status is to be deared 
or is to be held. 

Doubieni«stekMi ^^ITTIfffif^fffl'W^ 
5 or data into the X and Y input registers in the PALU 450 and FMFY 440» and the 
mukiplaning oCtbft double preciskiQ resutt out from the Z port. 

Use old AflqlMm fait Is set by the nderocode assembler when the most 
rigraflRwnt adftess btt to the register fDes ftar sO the porta remains the same over 
a^aosQt q^desL TUa is used to reduce the <yda thne Ibr these situatkos. 

PoOowIng is a short sample program (a nnih<p^ routiiie) in pseodo-oode. Tliis 
example wfll help to abcm bow the kmovathre features provide efiBdnt erocutioa 

The operatkm of this emmple is also shown sRhwimtk^ in Rgure 42. In this 

Mm m wn ^ m ^ fl ^fyy^f^f |>e nOtOd 

16 InstruetiQns gnnqied together within cor^ bra^eta {} are executed In 



Normal^ 8 nmltipBea would be dons per pass through the 
However, tUa has been cut down to 4 to shorten the rontineu 

The double buflMog is tmniparent to the mkroeodeL 
20 Tbm ralndstion performed is C(n) « A(nl * B[n] iidiere n is in the i 

Ol^ and the 8 operands and 4 results are at unique addresses in the register fileu Note 
that a * Oirimo) on one of theae reteenoes indfeates that the opposite of the 
COS 1 sspflndlng element* La* the elwnwiit wfaicfa is on the other side of the dowbiii buffisr 
bcfiiro tfie buflbra are swapped* 
26 There is a three stage p^ksiine: read operands firom the register fiK do 

fulmtatiniw write result back to register fila. 

The control p ro cess o r smdule UO deara FFWATT wtdch starts the 
lloatiDg^poinl proeesaor module 130 fh ft routinsii 

Tbs mul^lf routine is as IbOows: 




MULl! (TtestFIWArr. iffaliwhimn tolflJLl,elaecontfa 

vnn^h {lU^ ^[ft^ ^Tll Bfg] fr«n reriater flte^t 

MUL3: {Bead Ml] and B[l] Ihn r«gbt«r fil«i 

Do criculatteii. WHodt RfOI « AfOI * BfOl.i 

5 MUU: {BmmIA[2] aiKlB[a] flromretfsierfD^ 

Do cnlnihtttoa rewitt MU « ACll * B(ll> 

WtH "^hur ^ irr»^n BTOl Into ««i«ter som at qon 

{Bead A(3] and B[31 firom rai^ fiK 
Bo ffhlnihtkwy mutt B(2] - A(ai * B[3]» 
10 ^f^vahwcf residt Bill into reglrterfito at C[ll» 

{Do raHitortna, result B(8] - A[d] ^ 3(81 
Write value of result BCn faHo reglatcr file at CK3] 

Teat FPWATT. if tww hmtn to MHTit Mim coptfamel* 

15 - {Write value of result B(3] into register file at C(8] 
Set FPDONB and smnp bufites 
Test FFWAFTflafl^ if true ^mp to MUU dse Jump 

toMULll 

ICULfi: {BeMi A10] and B*(0] ten register file^ 
20 Do raWilstifm, result B(8] - A(8] ^ B(8L 



A«,m^ to MDL4I 

MULft {Bead A'[0] and B101 from register fiK 

Write value of result BI8] into register file at CLZ} 
Set FFDONB and swap buAr% 



26 



Writ! irf rr^lti iV^ rerister file at Cf 211 

{Bead All] and B'[l] flrom register file^ 
Do fwlruhtim, resuft BIO] - A'[0] « Dm 
Write value ef residt B(8] Into register file at G(31 
Set FPDONB and swap bufoi^ 



80 
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Thera are sevmd poJato to note about this routinK 

The routine la heasvQf optimised to keep the FPU batf an eveiy^de 
(l^ovidlng there la data ffar It). A simpler, teaa effident, yerwUsa would not Indude ttte 
5 inatraetkiia lAULS and onwarda. 

To extend tUa to mult^ 8 pahs oC nuoibefa^ the fautmctlan at MUI4 
would be repeated 4 timea with Afiferrat reghEter addreaaaa. 

bi order to keep the FPU operating on eveiy ^da it la neeeaaaiy to aeeeaa 
data from the other sUe of the double bufifer without having to do a swap. TUa ia mod 
10 in ioatnictioae lfUL6 and onwatda. 

No time ia waated In iym'i>w*>">«iwg ivfth ttie control proeeeaor 110 
prowUng the nait aei of data la acvaSabla FFWATT la Olati. 



16 The debug ha r dwar e on the floatingyiint pr o e eeao r modula 180 la math more 

UflDifeed thait tbait induided bt the oo^ttrol prooeeeov oDtodole no auid daAa tranefisr 
p r o c eeeor module 120^ becauee the mlrmfode that nma bare la irery much Hliiipliii. Also, 
anQf deiaig hardware mnat not degvade the cjele timeu 

AooBBa to the reglte Oe la provided through the locid tcanafbr hue 42a; so it oan 

20 be read nd written b^ the mootemfamcoda. The FMFY 440 and FiaU 4^ 

reglateva to be aeeeaaad. The nart ndprorode addreaa can be read by tlie control j 
module 110. hf affrnwlng the atart addreaa r^later 47a 

The breeivolnt logie uaee a bit in the adcroeode word to 4 
2S ^Oian an tortrn e tl oo to eneouptered with the breaiqpoint » 

halted and the breal^Qlnt wNtfiia w^wl in the control prooeeeor module 110 Inlerfhoe ia 
eat. Td cootfaHie from a breakpoint* the control prooeeeor modula 110 deara the 
\ Inp"* ksto the doch gBnerator* Onoe aufficiant Internal state baa -been i 
ito the I II I ah ptint, the control proBemnr module 110 i 
do point proceeeor module 180 mhrocode rumdog < 
to min aeoeee to the Indirect aoena a t atu a and the i 



The mkroooda can ootf be dngle stepped by wUing the breakpoint bU on 
inatnictioQ within the routine to singte stepw 

Anothflt feature nq^orting the <lebug capebifity is that the sabroutine stack can 
be read. 



5 flrnMrHlnnr^wtiinhml rrwww MMaltJaE 

Figure 9A shows a general overview of a nimierk aooelerattf 
an ap p Bcatk m-g iB t wnlTe d nmnerk pm*— ing module 180^ (aba r efe rred to as an 
"40orithm aoeeteratoO. Bf using the powerftil control tools provided, the control 
pf o cea sog UO oan control a confainatiQn of one or more numerio processing modules UO 
10 with one or more algorithni melerators 130^. 

It can be p a rtln i hH y advantc^peous to combine a general-purpose fkatlng^poi&t 
unit 130 with one or more algorithm aopcterators 130*. Xa soch a oombined agrBtem. the 
deaipi of the algorithm aoe^era*(v 130* can be fteed Ihsn the eons 
generalrpurpoee floeticg-pcsnt op etatinns . Therefore, the algorithm acceierator can be 
16 des^ned to be hl^ivpfieatkm-^eeifie if desired. 

tee particular^ advantageoos combinatkn maor be to indude a complex 
aritfametk modi4e as one of the moduke ISO*. 

P re fe rab l y the ^rr^**^*^"**^*"***^ processor is an i^ttcation'eMtoalsed 
ttumerie pi'pciwsw. Bowevert the *rrfr**HH*^innirH psocwsui oouki optional^ (and 
20 less preferabljF} be of a more CT oti c vaiiel^jt racli as a QfnlKdiB processor 0.ew a processor 
wfakh has the extra data paths needed to run USP or PBOLOG with high effideu^), 
or a neural n e t w aA machine. 

The control of multiple mmmie pr o c e ss or modules 130 Qncfaiding algorithm 



Figure OB tTfrmalHn Ur ahoers how the architecture of one """t** ^ an 
algorithm aooelerator 130* diCfers from that of a gcneral-pttrpose flo8tiog*point module 
130. 

Tint module shown is particular^ optimised to run discrete integral transform 



• 
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opcratkiu. For eauqiK thb module is pertieular^ fiaot at '*«»^«*^g the Fast Fourier 
Transform (FFT) algorithm. An example of the execution of this algorithm will be 
reviewed below. 

In the ftmhod l nwtnt <£ FIgiM 98, the register file 910 is even mora hl^4f 
multliMrted than retfster ffle 430. Begbter file 910 indudes four read porta and four 
write porta, as wefl aa a wide bkfirectkmal port OlOA wbkh intofooea to the cadie bus 
144. 

Tbm four Unas shown aa read porU 910B are wtnaQf repOcaied. Saee tba 
mutt^pllBa performed wiU tTpfea^r not be random muttipliea^ but wm be mnltlpfartion 
with a coemrimit (wfakh cfaangea leaa frequent^ than the data words), on^ one eomplez 
word of input is needed per for most of the cyckai (How«w 
&Hd cyde.) 

Ttfeefourmult^ ui^ 920 can be iitfeger or floeAing'point unita Tbey are moat 
prefonhlsr atmHsr to the mult^ite 440 described above, but of course other calculation 
units could be sufaalituted, Theae uidta win hofct the coeffldenta In regtrters, until thegr 
are commanded to read new coefiOdenta. 

91L T!i» coflD^ nmltipfier 911 is p^^^ned with two comples addera 913. 

The faiputa to the two comples adders 912 indude not ob^ the ou^Mta of 
complex multiplier Oil. but also data from reed ports 910C, fed through defaor block 94a 
ends ddij block can optfonaQf be wed to share porta 910B and 910C oo the regvter 
file 0100 The ontpots of the con^dex adders la connected to write ports 910D. 

Thus, this structure permito bntterfitir cakulatiana to be pqMiine^ 



The data cadm memory proridea a large ananrnt or hi^ bandwidth ston^ The 
storage capacilgf currenttjr ii a Mbytei^ mid the bandwidth ia 330 Idfaytea per second T^ 
memory is multi-ported, to allow data tranofora with the outside world to occur in 
parallel with the floating point ralnilatinm This helps prevent the adei^tions from 
occurring in a Vtop-start'foddon, with the fioathKgipoint processor module 130 stsming 
idte for lonff periods» 



Flgura 5 shows key Aatures at the date cache memoiy module liO. Central to 
this module to a large block of memocy S10> In the pceaentjy pcefer red embodiment^ this 
memory block 610 ia cmOgured aa 8 sSagle-ln-Une mortidBa, each containing eight nsKm^ 
aRAMS» for atotalor2 megahyteaofmemofy. However, It will be readl^ reeopdzed fay 
5 thoae ekflled In the art thai the mamiay imptomMitatlfm could ha ehangad, h* mt^>t^^Amfy^ 
with the Hianging avaOafaifity of advanced eemieonduetor parte and the Awtuwyi t of n 

In partkidar* It la coatempkited that for aoma ^ip^^ 
to ham BigiiiflQaitIf more memoty. Note that the ty-286 eonQgisatioii preteafa^y wed 
10 for thbmenoty bank 510 meana that the addtreeaqpaeek used eeimankallf. atlentfor 
ftiQy parallel aceeaaea. Tfaua^ bk ^ fn^m^^ ^ m r m,mm. % 24 bita of addreaa 

information are provided to the memocy bank 610 at addreaa input SIX Note that the 
write enable Inpot 612 te actttcdlf 8 bita wide* ao that imMdi^ 

2S6>bit bkMk of memory, can be eel e eted far wrttii^ Thfai ia adv anti g a oua, aa wffl be 

15 liliCU Ba t d below. The data pert 518 la 266 bite wide. Note that the Amctionafity of block 
510 doee not yet provide the msiltlport capability cfaaraeterietie of module 140 aaa whole. 
TCb0 )o0c for implenentatlott of thia imiHIport ftapahfHly, and for aooeaiing the ***^tr^y 
bank 510^ wiB now be deaeribed. 

At the bottom of Figure 5 are aeea the 32-bit wfale data buaea wUdi coonect to 

20 the CQiitrol pr ocemo r 110 (CD bua 118) and to the data tmaAr procemor 120 (the TD 
bua 122). Sadk oftheaefaiiaeabfiratfodktoafa61dkigregiBterbadc6aa£ 
regiater banka 560 containa e<^ 82-fait wide reglitera 661 paraUeL On the preeent^r 
preft r r ed fmhodlmwU ^ these regbleia 661 are each actually configured using four 
74AIiS662 deviceo^ oonflgnred to provide a write hoiUSag regieter 661* la p«tfl^ with a 

25 read holdh« regliter 561**. the structure of the register seta 660A. 600B; and 420 la 
fiartber flbown In Figure 2i^ 

When the memoiy bank 510 la accisaedp an address must be provided at pwt 
511> Thia addreaa Witt be provided tbrough mtdtlplerirr 520t from 
(wfakdi carriea addreaaea originated Iqr the control proceesor) or the TA boa m (fridch 

do carriea addreeeea originated by the data trmwfer proeeesh^ module 120K A eeleet kiput 
621 cbooaea w^ddi of these faqputo la to be provided to the addreaa port 511* 



HA 
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Paten4AwiBe«>k»iflrDuPn«tPlMri ni^iiM ^ Trtl BmBBJU 

ItM select ai0ud 621 to the mtiMplftiffr 620 is generated by arbhratto logio 530. 
lUs simple kglo pants access to tlie DTP module 120 ootf if the DTP la requesting 
access and the CP Is not requesting aooesa. The seleet dpial 621 Is provided not on^ to 
address nn iltip le ror 620i» but also to write mask multlpleier 680^ and to I/VP transfer 
6 Iog|o54a 

As wffl be discussed below, the write mask Input 512 is very advantageoua during 
writes from the TO bus 122 or the CD bua 112. Stee the write enaUs input 512 has 8 
UtaorresoliitiQQ, thed^324iit wordsfai eacfabkAofmemofly 510 can be separate^ 
enable for writfaig during a smgk ft4y parallel write operation. Tluo^ to ex^^ when 
10 the control processor HO wants to write leas than eight words into one row of manocy 
baidt 51% the registata of 501 to the desired word po^lbiM wtt be keded 19 1^ 
desired data vahM. hi adffitkn, 8 bits win be providsd on write 

widdi of the registera 661 contala InftnuaUi jn which should be written Into the 
cccreapondbig words of memofy bank 510 sit ths row faMfieated by address 611 (from the 

15 CA bus 111). (As noted above, transfer ot an addreaa from the CA bus 111 into the 
multipknr 620 Is eontroBed by the output cT the IPU 840^ 

Pjgare 26 p rov i des a dlflfa i eni view cf the write mssk logk. In thk figure the FP 
wriU made kgk 2510, CP write moiritor kgk 2S20» and DTP write monitor kgk 2680 
are brokeik out aa three aepante faloclK% wtkh provide inputs to miihlpiftwff 580. Figure 

20 28 provldeaaaiaredBtaiadvlswofthewQridiwiorths write mimltar kgk bkc^ 

to the kgk 2610 kchide Bag^star Sdect, Write A^ KMtm DCBl and Load Holdh« 
Register. The output k ei^ Hsg bits^ regk>«ir«d In regkter 2620l 

The trantekgk 540 k drken fay nskrocode Instructkn Odds 64% iriiidi sre 
part of the mkroeods kstruetion sequence withhi ths dsta troiafer processor m 

26 SimSsr^, the CD trendSnr kgk 650 k driven fay m fcrocode instructioa bits 662. which 
are part of the ndcroeode indrnctkn driven by the aeq^enoer 210 of the control processor 
raoduk 110. (bt fed^ aoma of the mkroc od e driven by tlds sequeneer k preferabtf 
^btrteted. Ttet iii^ sooBS of tiM iUds of the niicrdnstruetkn m stored separa^ 
thaoontroi store 220|, but are docked fay the ssries of nilcrojnstructka addreassa 211 

80 wideh are the outputa of the sequenoer 210. Thk provides substantial advanta^ 
qrstem eonteait» and will be djnniHsni bekw.) 




The oitm outputs 548 and 658 of tho tranote loglei 540 and 550 Inehtde audi 
control ftiDctioai a» control of the respective register banks 650. t~'tti>w«g doddng and 
output enablft. (Note that each of the register banks 500 has two output eoabtei^ for the 
tm mdes of the register bankt and two sets of doeka^ Note also that one of the (Ua^^ 
5 oontroOedby the CP transfer tegle 550 is tho output eodblelfaift 614 of the »ei^^ 
5ia) 

U win be noted thai there la no dBrect kipufc fhmi the FP module IBOtorequeii 
aoeeas to the csche bank 510. TUi is because nicfa a eca aa es are oootroQed hy the control 
processor module lia This sur pr falpg twist turns owt to jirid ajcpifleant adf^^ 
10 wfll be dssdfted bdow. 

Mctwit (TfffilhnnnUfltt 

The ac pelwra tor subsystem usas a wide meuMify ardiiteeturei On each access to 
the data cacfaememoiy 140> 256 hiU are read or written. This repc ee enU 8 Itoattog^potot 
16 words per cyda. 

The data cache memory 140 ia trHnrted to the control p ro ces sor module 110» 
f |<^ ^jn i ^p 9fa 4. prooossor module 180^ jnfyt t wMM^ ^ prfl fff twior n ywf ^ilfl^ IdO^ birt because 
the oootrol pioceesor modiila HO and floatingpoint proceeaor module ISO aocsasaa are 

20 irnilUp l aalng ootf needs to be doos two ways. 

T?itiPirrtt 

There are three porta faito the data cache mem oi y> The port to the FP modulsCi) 
Is 256 bits H dfty and the control ficoccssor fiKHi H 110 and ***** tmnate procoooor 
26 module 120 each see respective 88 bit wide portsi The data routm« and ston^ fbr the 
82 bit wMb porta to oMfaaM as part of the data esdie btodc 14a 

The nidt^lsidi« of the 256 bits of data from the memory anqy onto one 
82 bit bumea fa I n qi i wmMita d with 82 bfcfirecttooai regftrtera^ arrai^ as 8 goupa of 4 
registers^ cacn 1^019 sioceaosi mm ona iBaniHypomt worcu in vne reaa mrecnon ana 
80 82 bite m the write dir e cti ott and ia caDsd a holflng r sg jst s r . Tbe more rpedfic naming 
of eadi rcgfator la read holding re^stsr and the write IwiMIflg r^^ater aa aeon from the 
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rwfnnt rinrilirMliiii of 




10 



20 



26 



processor aides of the iot^fiKo. 

When data ia read firom the memofy amy» aU 256 are stored in the holding 
registar^ and the enabta of these registers are oontroQed to sd 
floating-poinfc vord onto the 32 Ut port 

Yihax date written to the memocy array on^ those registers thai hacve been 
t^idated from the 32 Uft port ere stored. lUa Is controBed hy the write maidi loglo and 
is achieved fay tiaiiig 8 write enafalefl^ one ptt groi^ 

Both 33 bit parte hanre identkel data routfaxg and 8t0ffi«a logte. 

The 256 fait port to the Ooating-point p ro c e ss or module 180 osodule contidna 
sfanOar logte to the 82 fait port% hat Is located op the flo ath ^^ p otot proeessormodule 130 
modidew To allow fbtweeapanaioo of the data cache monotyt using modiilei^ the addreaa 
hue (24 bits) and write enables (g) we taken to the module connectors 3810 (shown in 
FIguree dSK and 88&) 



The CP trensftr kgfe is rewponsfWe fbr the transfer of data between the CP 
haMb»g registers (or the FP holdiDg registers) and the data cadie memoiy. 

The data hi the homag r o g Mer s is s cc esse d it^ea the CD source nto ocode Add 
sel e cts the read holteg r tig ls «i r .Theleasta<g >tacant ObitsofthsCPaddreaebussdecta 
the 83 bit word to drive onto the boa. During this process the data cache mamoty fan*t 
used bnt it eould be srrHning the neat set of drta if necessaiy. 

To write daU into the write hoi<fing registers the CD dwiHnatinn mirromde flekl 
adects the bofcfing registers as a grony^ and the least wgnHVimt 3 bits of the CP address 
bus CA 111 sdeet the 33 bite to tvdflEteu When a write hoMng r^^ster is i^dated, a 
. oorrespottSng write fleg ie set Theretee^ wiien a write to the data cache moaofy is 
donOi ool^ the holding roasters that hate been t ip dated fay the control processor module 
110 are actuary written into the m e mo r y array. Thoae words in the ne moty array for 
edikh the corresponding holding register had not been updated are not changed. The 
write flagi are afl reset wtien the data cache memory It written to (if the data source ia 
the eontrol pcocesBor module 110)* If the eontrol processor module UO bad been i^datteg 
one of the write holi&ig registers durtag the same C9«ds that it had been writhig 
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data cachfi memory, then that write flag bit would remain set 

Sometime It it admitageoiia to by-peae this orieetive write mecfaaniam, for 
emwnplft wdien dearing m e mory to a constant valua. In thia oaae the control proce ss or 
module UP can owriide the selec tiv e wrifcfa^g; and force att wwda to be 
6 this seiectiye write cMpnbfflty the write opmrtion of the data cnch^ 

aiow, and wouU fanrolve: reading the blodt ot data (296 faitid into the read holifii^ 
nu g iit e r s, transfer the words that were not to diange to the write holding registera^ 
update the write holiAng reglsterOO with the new dBt«» and then do a data caGfae write 
cyde. Ihthecurreniarcfaitecture the copying of data flran the read holdfasg regtaters to 
10 tha write hdlfiog regiatera would taka one cycle per word. 

lbs stataoT the writa fia0i can be ertraeted noo-dftetnictive^f by the eontnl 
pr o c e ss or module 110^ (br tha purpoeea of state save durii^ mi crocode 'W^^ggh^ 

The read holAng regiatera are separata from the wrtta holing regiatars so 
mui^ila read cydea can be dona without di s tu r bing tha contenta at tha write hoUc« 
15 re^sters^ and doe versa. 

To control the tranafibr of data between the bok&v register acta and the data 
' tha following ndcrooode Uta are needs 
ffltt TMlIT tmiTfft f TUa bit is acthre wikouver an aeccas to the data 
ia re<|^iired by tha control procesaor module 110 fbr ita own uae or to 
20 tranalbr data to or fifom the lloatfa»point proceaaot madUle ISa The access flag is not 
Therefore, arfattratkm with the data transfer p coceasor module 120 . 
. be sorted out before tha start of the cyde the reqiteat happen 

Data caelia write enable fl^ Tfais bit generatea a writa cyde fat the < 




26 Data cache write afl m This bit overridea the normal write < 

that attowa sfiWllia updating of words in the data cad>e memory and forces them afl to 
be written. 11^ ia useful when aetting blocks of memoiy to a constant vahsa. 

PntH T TT ^n nr r t aeleet m This hit M i erf s either tha FP module hofc«py 
registers or the contrd procesaor module 110 holding regiatera to be the aourca or 
80 deatination for a daU cache transfer. 

Tliera are three bita in tha mode register that contrd the hddmg regiatera. Two 



biU adeei whofcher th« hol^ registm are to be used or by-pasaed The third bH 
^aablee the data cache memflry from dririny thg IXTM ifata him a LwipKM^w Atfai p»^h 
can be set up b e iw e eu the write hcddfaig registers and the read holding registers. Tlwse 
fifiriHt i wi are oohr present so the state aawe and feat<»ft mte fMMMto em gprfn mtmmam to 
5 write holding reglsterawHhouttedng a data cnehsm^^ 



The eontrol pr oce sa or module 110 can use the data cache meawfy In two wi^v 
The first wsQf is to ipme the wide memoiy avchiteeture and treat it as 
ifit werejust 32Utawide.Todothia» the CP modids 110 simrtr requssts an access 
10 qrde prior to eveiy read aooeas and after evety write aooeasL Using this method* the dita 
can be regarded aa just a msmofy wilh pipofined data accesses, TUs 
using the data cache nenmy, but does not make efficient use of the 
memosy'sabffity to serrice the data traaato pro cesse s moduie ISO port. This method also 
tntr odaeea inefBrisnetes when the oontrol pr oc es aor module 110 b aooessin 
15 data. However, for non-sequential data arf iisstjs the neat method cannot be used in i 
caaa. so this first msthod must be used. 

When the control proeeasot module 110 is dofag »^*^^\ 
, it takes it 8 <7cte oTreecfing or writing to aU the holding reglatm 661 for < 
I to the memotj beak 6ia The data cache memofy aeeeas can be idpeiined 19 with 
20 the hoUng reglstsr aecessei^ so 7 out oTS CTClsa are free Ibr data transto procesaor 
to uaa. The data cache msmoty a cces s does not occur automatieaqy; so tbs 
to ^edQr an accsos lyde eveiy 8 cydssi TUs type of transftr Is more 
to ocenr in the data transte proceesor module 120^ becauae I/O transfera to or from the 

26 The oontrol processos module 110 is also reaponsMeibr transferring data between 

tlm data cadie memocy and the hoMiv registera on the FP module. In this cose the 
basic control Is the same eacep4 fiir determining whidi words within a ^V>^ to *ip*fa^ 
dufiiV a write to the data cMhe menkosy. In tlda instance a afferent «9proa^ 
to the write flags as described aboveu 

90 The aflBBrenees arise because oCseverslfiMtors: 

The translSBr kglo that governs the data flow between the FFs 1 



BiiBBt AanflBstien of DriFone Ptcai flweema, lid. 
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fflft ud tlM holdbig raglstm Iwv* mm Uaaitaaau to th« mare ganenl write muk 
feomtor uaed fai Um oootrol pcoceMor modute 110 to not nooeamry. 

Tb» nonoal dita MMfen from Uw FP's reglitor file are unial^ M*^t 
ordate(Le.partora¥ector)aodMCfal»httppaislaafliogto traute CTcto a attmber of 
tbe write niMk biu must be set in pannal nUier thaa ittd^^ si hi the eeae of Uie 
cootrol pi oc i u e KM module 110. 

llie one FP write naak genenftor imiit 0^ with multiple FP oiodulML 
TtB VP write meik ie genented hf apedQriaf the word to update and the 
number of oobaecutire word! from the llrat word 

aignifieaat $ biu of the CP addieaa and the lei^th to held ae a field Iq the 



ne data tramte prooeaior module 120 tmoite logte ie leqwarifale for the 
rdita between the data tra&ifiir procemor module UO data bi» (TO 
ttkd the memoiy arraor. It ia very eiadv to the CP traute lo^ eaoept: 

The porta awwrt at ed with the fioatiagLpoint ^ancceaoi modide 130 are 



llie output aipiab are q^afiOed fey the reaulfte or the arltetloa )08l& 



Tte arUtratiaQ iQgiB determlMe who haa axoM to the data cache memory oo a 
perqMebaila.'nietwoeompetingpQrUaretheCP/FP nd the ^da traniAr proeemor 
module 120. The CS/FP baa priorilgr over the data trante proeeaoor module 120 eo the 
data trannte p ro c em ui omdnle 120 ia nrnde to wait for a free memory The data 
odule 120 caaforee the oontrolproceaaor module 110 to iii^eet a free 
^fiele by interrupt lug the oootrot prnf^wMr module ua 

Tfce artitratioa of the data cache memoiy hm been Amplified by both port*o 
r a qu iete (or demand in the CP/FP caoe) being l yr r br^urM Tfato baa been 
i by dmrfag the aame dock ge nerat or b e t a eun the control proceaaor module 110 
procesaor modide 120. Without tfaie depee of flynefarooisatko» the 



eookrol prrtn m uf modulo 110 oouM nswr aaiume it had mecem duriiy a i^cto* baeauM 
U» d»U tngaftr pro c cMo t aaoduto 120 adgfat hm juft sfcartad an aocoa. 

TIm <9cle fay cyeto wbitndte b doo0 in the arbit^^ 
takM two raquM* iiffialK CP reqiMt tad DTP roquaat Both tluM «• nkroe^ 
that are aaMrtad wh«Mver that port inPHwai tha data ttcha namofy. Tliaaa 
bita afa ooo-ragiMend to that tha arfateratktt caa ba aortad out CO the 
aooaaa occur. Thia aDowm enough tloM for tha data tiaaflte huhjmw module 120 pant 
ajgnal to be teatedby tha data tfaoaferproca«)r module l» 

to tha p^efiDhig or the aequeooer'e FLAG 
Tha two output elBMdaara the dip jM^rfpi^^ Ubm the data tmute 

120 that it haa aooeea to tha ^ «ehe memotT, and a ^lai that 
^ addreaa and write aaafale mnh^lennL 
Tha CP/PP acecaeea tha drtacache mamoty a. if it wa« a *gte pcr^ 
Tha data tmnato p ro cea a w module 120 howai w mnat fo thm^ the 

seeaa. TUa procediffa k written hi J 



{ do eoma writaa to tha hokSag Kgtaters } 
WAFTt { Kqooat write aeeeaa to data cache memty 

if aooeea fisOed Jump to WATT ete ccathnae } 
(doaomaotherwoth} 



I poiDta to note rcgu^ tUa I 
TUi logle helpa to mahitafai a larie mnooitt of woA goiog on hi ] 
If the aaceea Uad, thm the write (or the loadiiv of the hokfing i 
on a read aoeeaa) la autoowtkaQf ^m^^^ 

The raault of the teat iatetea whether tha aeceaa waa suooeeiAd ornot 
Ifitwaa not then the data traMfer pr o a ee m module 120 triea agdn by tocpiiy on the 



Thia eaaaiple haa diown tha data traiMte prooeeacr module 120waitteg 
Boaea ia panted H6wa> ei> it would aama^r wait on^f for a certidn of 
«3Kiee. If acseee stffl had not been granted^ the DTP module would then tatterrupt the 
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eootrol p ro ca woc moduto IIO. During tbo £bw ^dm the control proeeflsor modulo no 
takoa to aerttee fch0 inftemipt Um dsU oMdM BM^^ 
> module 120 to aooeoo. 



S 

Tbo command momoiy 190 provfclao communhittoQ botwoon thocontfoi j 
modulo UOondtlwdotntranoteimKeoKvmoduto ua Bothhmo^ 
DuoiportHAM«or>uoodliithapr#«.i»t^p>^^ 
io 32 bite wide bgr 2K doop>. 

10 Fteuw 15 •how aomo akmifleMit tmrntartm ^ ^.^y "**TI1T fy Tho 

opofrtion of tho commoDd queue* I. doMsiM 

IB, where the iMoeeaor intetftee between the CP module end the DTP 
iltii iimuiL However, aome kej feeturee oT the oripuiintkiii of thte memofy wfll bei 
ottUe time. 

15 Theee duel port HAMe ellaw uareetrieted by p«Pto ^ 

Hrtrfrf ea e i are Mferent. If the two e rirff iimaa aw equei, epd if both Mdee ere writfa^ th« 
the reeiiltie undefined Aedjecueeedbeiow, the fnmmmiH^ 

prnrffieoero le arrmi^ed eo that both never need to write to the aeme edifreee. thmteo 
no arfateatka ii neoemaiy. 
20 Sollwere cootroto bow the control rrnffMre modide 110 

mod^ thee ll o cntl one wffl hichide; command qmoe to the CP module 110 fe ^ ^mma ia% 
of memety ^eoe); command queue to the DIP module 120 (e.g. about aft% cfmenney 

25 The lit i ^ eanre and r eM ore data atmeture ia geaenffed for hy thm figgfgffMip 

d^ug monitor, to hold the control prooeeeor module 110 
130 otnto intematloa (in wea aa I 



30 Hgme 6 ihowe principal componenteofthehoethiteitetogk^wfakfate I 

ihown ae block 160 in Figure L In the pretefed embodhnent» the ayetem bua k a VMB 
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fnm, tad thmfim thli iaftarfiM logic li often refomd to la the praent diidamire aa the 
"VMB Intoftett.* Ho««v«r, m wfll be appanot to thooe akflled In the art, a wide wie^ 
of other ^Tston buaeee eould be ueed fnifred, end the rtftwlnefd famovatkm can rnnriHi 
be ad«(»tod to aueh ^ystemau 

A by ecotroBer 660 intettoe to the VBCB bua eerricM Bnee, to provide auefa 
intertee dvaais ao bus grant, buo requeet, met, et& 

The interrupt logie 680 ie uaed for internet handaaft to aend iittemipU to the 
ho^ (TbeeelnterruptawiDgoouioQtfaeVllBbuaaorvlDealineaaOQBJ In 
pretefodembodfanent, tUa la imptoented uaing a PAL, aa deaeribed below. 

In adcfition, a DMA oootroller 640 la alao prafMi^ provided 
level contral of data handBng be t w uun the VBfB boa and the FIFO 670^ witboot 
auparvite or aO Intarvei^ rtepa by the dau traute prooeaaor 120. Li Uae prea^ 
pr efa nade mhodln i wif, theDlCAcQntreHflr ia«oiil^uredt^«aPAUMdaaGribedbeft»w. 

The VBIB intettee providea four aaabi aervieea to the boat |irnrraa<a 

Mieroeode loa^ng iria a aerial aean loop tatertee to tha time tjpea ^ 

Data tmato t^ltan the VMB adifreee apaee uafaif DMA aoeeaa to the 
boat OMflMty ao the anbqratem can trauilbr ita own data. 

Debug (bardwara and aoltware) fiKStteab 
Tha faiternid oooneetiona of tfato faitertea logic inehaie; the TD by 
the TA bua (lor adAeaa fatftan atkn; the CP mfarnartrtifM bua 311B^ tha DTP 
mkraaddraaa bua ailB; theeerM Bteoinatruetioo loo^ 
atahia linaa. 

Tha aitemtf wwiwUi a w , to tfaia iwnbwflmant, are to a VMB bua. The finea of 
tUa bua are dapietad aeparateir. In ngura 61 aa addraaa llnea 600A, data 
boa aarrieea Baaa OncfaidDg atatua and cookrol Snea) 600C 

and aa mC BXL The fatetlbea Uodt 160 wiB accept 32 or 24 bit laMmmwe and 32 or 10 
bit data. In tha p r eacn tbr preferred *-*>>^"^ aooae orinor w-«*«rt^ hatm been 
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on tbft tTpe o€ i 



cvail«bl5i to ke«p the 



and data routiite 



AUenathre^* m wido variety of other bua oqnfiguratkBa could be used instead 
For enanpla* VenaBua» FutureBui^ Multibua H, or NuBua oould be raadQf derisned Into 
tha system if desired. Fteveiyhigb-flpeed computing ^ysteoi^ it might ba advantageous 
to use optkal buaaea» using mortuiated soild-state lasers on optioU fibers. 



Hm logte blocfca liHileh hiterflm moat direct^ to the VMS biM win bai 
first. Other kgfe and memoiy blo^ wffl ba described thmaaar 
wffl ba daacribed bM*; the deecriptkm of this reglte block con^ 
which ftirthar dariflea tha opefatka of the < 



060 interfittea to the bi» 
decoder fifiS wUcfa 



ttnea eOOGi and ate ] 

tUapartinilar boafdto 



IfaMofthaVlCBbia 
sped&ed by tho boat 
Tha aetual decodhig of the 
can ba eoofigurad to fit 



The decoder 662 la eonotant^ imtchhig tba 
to provida thia decode outpoL Tlia adfraa of the 
ia satbythauaer atfaMtalktaoQ. wingDIL 
Bodea ia dooa hk PAIj^ so tfas 
into tha target VlIB ^stam sas^f. 

The bua eootrofler 660 proridea enable aviato to the btdiraetkod datn b^ 
or tha bi dir a ctto oal addreas buffer 660^ fa npcordsnoa with tha WISE copteol protocohi 

Tha bua mMOfr 660 ia also cmmscted to recehra status intematkm fton the 
DBIA mntfo B ar 640 and tha VMB interrupt logte 680 (and alao from other logte hlocka, 
i fa detail bdovO. The bus oooEtreOer 660 is ate ^p^^ to 
to tte DMA eotrtroBer 640^ tte VlIB interrupt logte 680^ and to 
(ae will ba deaoted fa detafi below). Sinca the 
of tte bua controOsr 660 are eitsQsftvQ, thqy are not afl 
to avoid pn s siil s conftm i nn However, their itwnsrtiops win be rendi^ apparent to 




•kOM la the art. 

In thft praenilf pr rfef t ed wnlx w ll i itftm , this k tmpleowfited ai a VBIB bia 
eontroOsr device (Sipietks SGBeSim TUs headke ea tba bus pratoeota, inehidiiv 
arbitration for the mMter htterfiM aod hue mr cyde^ 



5 Mwtnr ftni ffkm Mnrtm 

Hie VMB interfefle on be mn ri d e w d am two &ir|y aeparte hiter&cae; e ihwe 
ipteAce and > mertar faterihceJ mptomeptet ioa of the mater mode ia deaeribed betow> 
wtth feteenea to the DBCiW oonftroOar 64a 

The iteire mode ia i ii i ptou i eui ed ^ut^ aim aiWioai decoder 633. When the 
10 Mc e l e fBtar wiboatem ia operating fa alawe meA> (mm Aai ly vme i.ii mm> .u^ 4ir Hri 
bgr boa contnOar 600). the contrite 660 puts the biareetional boffora 060 to e paae- 
throo^ nodflv and eneUea the afam addreaa decoder. Tte iteve nddraea decoder then 
deeodea ^tm eddreaa brcugia in from the VMB edteca Xnaa 600A» and emblee the 
Wnpriaae device*. tee the ovtputa of the alanre eddreaa decoder are wide^ 

16 eonneetedk thegr are aoi aepanitelf afaown. 

The tee a d dr eae decoder aiao containa the oeceaeaij OTACK aeoeretioa kgfe 
to ooo^ with VlfB protoeda. 

Under the VBfS protocol, the current biy ^ ^ ■UrMeni a boMd» and tlwft 
board can OB^f reepond in aivre mode^ becanae on|^ active nater k tewed at 
20 ooe time. (There cen be many maatere wailing to be graated enffem to the bi» and hence 
become active.) The maatar than wte ontfl the tere reepon^ with IITACX (date 
trante e chnowiadg^ to aey it haateken the date (write opente) crhne provided the 
date (reed operation). 



26 

TUe ie • bkSrectknal btte, which providea direct interite to the VMB 
finaa 600& 



60 



TWa ie a bidirectional buAr, which providea direct iotertee to the VMB vkheaa 



Udm 600R 



TUs m a moi y pr o vfa tea a 
of uM of tUa m described 

120. 



Mock of ston0B ia the interim 16a A 
in coonectko with Uw opentlooe of the DTP 



MwwffT Map 

Each aoeelermtar flubqratem usee 8K bytee of VBCB <ddrm spm. The 
10 eddreeeof thii a ddreee q>ee« i» eetoctod by 8 ewildiML the : 
as «0 oAet from this base aiMrroa The m emflc j map for the i 
ba bfftAan taito 3 araaee 

nanocy eiw la eoBti^Oed by softwanu SoM of the data at^^ 
16 typical cQotatai wffl ba mentteed. 

TUa area ta uaad for many »"*p^^ tkEWM^thm^ aa wfll ba *tH'*H in < 
The maiBOfy area k iharad ba*iieea the miemode debogipr aa^ 



Tte dehugver area wiB oootafai the state sate tatenwaoQ of the 
B waO aa a mmmand queue which parmita the monitor j^fcrimnlft to read 
y, FXFOa etc 

The run thae inter fim oonahfta maintf ei a foimnwnd queue that the 
device driw can add to and the bgt« mi ewMM^ 
26 There are several r ea UkOu na on how the hardware can be erriMoiL llieee 

reatrktkna are inpoeed pctearQy to keep Uie hardware shnpie, while stffl aOowtag 16 
or 82 fait date boa interftcea. The reMetlons era: aoeeaaea are not supported; and 

18 bit IB P must oocur on ki« word (82 bit) fawiiMkihn 

The aieniacy 660 and tha data FIFO 670 era 32 bita wide, ir the hoat 
80 OFMrniaaiabtt Byateia* thet(9l6bitasrenotsmsaai8^FQral6bitiystcmtowrto 
to c o os s cmi fe add r es s es In the ttenwry, the add r ess must be i nc remen ted by 4 to mow 



onto tlM DaA locBdon. 
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rnenuay m 


ap med 


1 17 ^ pntered iub^sU 


Control refl 


fsur 


0 


lOrmUwrtte 


Strote bufl 


br 


4 


16«rtU 


Stafeui r«8ii 




4 


8 Kttd 


WCS cooin 


b1 regtsur 


08 


16rMdMrH« 


WCSoofUnl Kgbter 


112 








16 


16Md^vrite 


CP mkrad 




20 


16md^»rlle 


DTP mkroi 




24 


6 fMiVwrito 


Date FIFO 




28 




IPiiieBiocy 


660 


4096 


32 roMl/wnto 



Tlift data FIFO 670 provito «a in^octeit c^MbOij ia tha 



16 



In tha nflfiimi n#«.>,,><^ mwn ■rriMDil y *ht »*^, 

thaDliAeoBirolferuinthsnenii^^ tte hoai cm pte mom to them li^ 
20 Plwffing th> WFD aeeew bit itt the coatrcl wgbUr, 

Tha Uock rinwa as FIFO 670 ia fijhyiif^f iapleiaenled aa two FIFVH to pda 
tte tocttaafily of a biifiNc^^ 

other to written by tho hoit. Tba olhar enda of tha FlFOi are Mceaatd ty the WP. 
I <ni]^k0BMabirtteMi««iitbaFIFOato 

25 wooldbediflteeaiftomtlie writtaa<fataO 
j WI»thalioatiairpiiriiwtii»FlFOaitnMi^iiM»i^ 

j ^ FIFO la new lead whan enpij or writtnio whan ML (Tba host nrigfat need to 

! «»aaa thaaaFIFOalbrdlaffioatie^ or if poQed I/O father^ 

30 



Tha VMB protocol providea for a number of toami^ TheaentamipU 



raiMi AmiiilliH iJTTiirnil riiii fliiniiM Tifil p^trt IM 

tHsamd by tlw DTP modula 120. 

Tike DTP module 120 ate d^faiea the intemtpt vector. The veelor can be 
changed depewfing oa the reaeoo for the Intemtpt, or aafatgle vector caa be uaed, with 
the cause(a) of the interrupt held in the VMS iatoihw meaocy eoa 

fleqiw nt ial or block mode tranirer% between ^ FIFO 070 and the VlIB bi^ 
are mvported the DMA ccmtroDer 64a (TUa oootroOer atoe suppwta the more uniat 
angle word trantfenL) The DMA addreaa ia the ftdl 32 bita, and the VMB adtea 
modfim and LOKO* sipiala used during a treoite are aO set tq» bgr the DTP module 
120 in regiatere befoiv the tran^ starta. 

The oppodteiide of the FIFO 670 to filled or emptied br the DflPmodu^ 120 
(normaltr into the data cache memory 140). When 16 bit tmatea are uaed* ttie inP 
microoode pa cH ft mr ^ rfrt the data tc^tan the 69 bit internal ftnud. 

TUa part ia referred to aa a OBIA mntmilrr bf ualogr, in that it can perform 
block data tranatea to and from the FIFO 670 ia reipoaae to a faWh-level 
ooeamand Itan the DTP module 12a Howewr, the ftmcOuuiiy of thfa logle ii net quite 
the Mme ea that of eoounercia^f available DMA controller efa^ Normal DMA 
controlterawfligH their data and ad dre m inflgUMrtioo from the wune bt» ai the ooe thy 
uae to raCAeoeeaa when active. However, the DMA controOer 640 receivee ita addreaa 
Inferm a tkD from the DTP BMduie 120, and uaea thii tofe r mati ott tn mntml tTw arirtrnm 
and data Inter&ee to the VMB boa. 

In the preees^r pretered mnhodhnwit, the DMA oQutroQer 640 ia actual^ 
imp l M B CO t e d uring four Am2940 DMA bit attee cfa»a> with eome aaeodated logic in PAU 
aa iSccueeed below. 

The eetup of the DMA oootroler to done br the DTP module 120» and the <^ 
la tr aMto ed between the VMB bua Bnea 600B and the datfa FIFO 670> 

Tlveeaddrearingmodea are available. Which of theeeia uaedwiQ depend on the 
^Ipe of ttaoifbr or q^atem oooQguratloo* 

addreaa to every DMA aooeae to the VliB memory and thla it uaed when ao^ 



I ftiiiiiaifin iif ITiiriii 



ports* 

Tncrrnnmit iuMww hr 2 for dtiamm^\ TUs addreaobig Doodo Is uMd 
when the VME memory being n ccf e dis oa^y 16 biU wfcte. In this case the OTP spttta 
or morses the dsftn between 32 bit words used InterasI^ and 16 bits words used 



Tnmmwmt Addreae h9 A (f^ jimtmttrt)i This addrwhig mode is ussd 
when ths VBIB memory bemg aooeosed is btts wide. 

Of eouroa, multiple status ripiato sre prefimh^ UMd t^ 
the FIFOs, as is wei known to those afcOled In the art. For commple, mdi statin ligDsis 
would incfaids FIFO empQr. flFO hatf-AiB, eta 



TUs logic provides the hiter&oe to the mhrnnrtitujss homes 211B and ailB. and 
totheeeridk)op22& (BCm preds^, ss afaown hk FYgim 28» tl^ logk^ provftdss o^ 
serisl output fine 22SA. and reeeivee four return fines 226BI 226G; 22SD, and 226IX> 
Ti^^ai^w,,.^,^ ^»Kt, — 1 ii — u — ^ i—^-iw, urn HiriMifiil hi ileum ilus 
(fai eottn«!tioQ with the operation of the seriai kMp inters 
27, 28^ and 20. 

Note that tUs logle must socess the CP and OTP mirTnaftiTicni registers hi ths 
Rsgister btafc 612. It slao aeeessss the WG9 opseifier eootrol re^eter. These reglrters 

^ In rtgltfiT WfTft m% hilt muM nltftrmtfrfniy ^o ro|pr<k>d ss part of q w> 

iogfifr6ia 

11iisbloekhftdiidesaa^-aop2720^aetatemael^ 274QL a mtdt^leaer 2710^ and 
the WCS dsta register 2730 (wUch is n shift regto). 




The host usee the oontrol register to control the baafc operatioai of the sub^atem 



Plt<il Amiwiinn !# ftiPimi niri ffnfa Ttil , tmuaa, 

faitoane 

CP wwy i w ic er met: This bit wfaoi aet Ifareet the CP sequenotf 210 (o 
Jump toaddreaeO, and reaets the inkerael aequenoer lUte. 

OTP aequenoer met: TUa bit when aet tetea the OTP aequenoer 310 to 
Jump to addreaa 0 and raeU the tattafiml tequeneer stato. 

inP neat: TUa fait whea deared piMea the OTP In a aafo atatfl^ ae that 
aS the bue e a are triatated Ite mafai uae of thb ii when hm^B^ m fatwode to prevent 
faua otwttimtinn on ffle^id mk rocoda faiatructkna. 

CP Meet: tUe bit whan claarad plaeee the CP to a nfe etata^ eo that aO 
the buaea are triatatad Hie flsahi uae of tfaia ia when looAsg mkroeode to pcevant boa 
eontanfcion oo mngal m fcroeede in a tm r ti opfc 

FP reaat: TUB bit when cleared plaoea the FP In a Mfo atato, ao thnt aft 
the buaaa are tfiatated Tta main uae of thk ia wfaeo lona^ mtgoeode to paewui Um 
fontentki on jDegad a ai erocode inatraettoML 

VBIB FEPO reeet: bit when claared aeu the VBCB data FjDPOa to the 

ieflBpiy atata. 

Data Pipe FIFO reaet: Thb bit «to deared aeU the 
to the anp^ atata. 

cap FIFO reaet: Tida bit when dMrad aeto the GIP fatarfeee FIFOa to 
the enpigr atata^ and InitiaBaee the GIP hitar6naw 

Free run docfca: TUa bit eootroli the CP and OTP ndcroeode dock% and 
either aOofwa then to ftee run or atopa them. When the dacha ate atbppedthe^ can ba 
afaagla atepped bj the hoaL 

Hiaahla dochat Thia bit rffiWaa aP the miciocode doaca for the CP md 
OTP enepttlia clock to the p^Aw regSateni Ttila h nrrncaarj n nnirrr the t^A^rf t 
to ha read or written without (Brturfafaig the atate of the CP or OTP, £br eiaaq^ 

Firaa nm FP docfcas TUa bit eontrola the FP mkrooode clocfca and either 
alowB them to ftae run or fltop» 

Fittb access: Tfaia Ut eontreia the aeeeaa to the VME ^atm, FIFO. Tha 




oornMl option Is to lot the intoriMl DMA eootraUer h«ve maduMhn aeeeao ond eootral 
riathto biit for rflngftnotirB or in a VMB afawo oojy eaviwomept the hoot can take eoatrol 
oTtheae FIFOi by aotthig thia bit 

Hkrooode loops TUa bit it oofy used by the jfrg"**^*^ to eauae a teat 
5 torepeattoBtfat the oiierooode level 



The boot usee the strobe bufilw to oQiiM aipoeta of Uie flufaiyiteia tlMi are edge 
or puke related. If the strobe bu£far la wrktea to, then te eveiy bit that ia sat a 
10 eorre^OMiBrrtrobeBae wfllbepulaed. Thto autonatie strofatag reBevea the host from 
bandog to toggle a atrobe fine by first aetting a and then dearing it. Tlik actkn wd 
in the write Bkode ooilf; if the hoot tUa UdStr, h wiB xeoeive some altanaithe 
statue infbrmatioQ back. 

The atrobe bies are: 

^ ftVnr Thia wn step the CP and DTP ndoooode docks 

through one qida This k «aed when hardware ah^ ateppa« and kiadfaig; rcttfii^ or 
flSodlQFh^ the WCa 

FT riwBiW <»>dC The PP pipeBne cioA ijgial la only na^ 
aerial aaieroeodekwp control when reada« back the eotttenta of the FP'a^^ Tberun 
20 Hme pipeifaie dock ki the FP ia the MM aa the nonaal FP oteooode dock. 

CyWG8wrik,aglte Thia dgpntcauaaa the eye WCT 
with the data prevtoualy kMded faito the aerkd b»p at the adikeaa apedfiad kt the CP 
ndoroaddreaa regiater. TUa is quafiflad by the kad WGS aMak for the pvU of the CP 
WG8 that Ue on the FP moduks. A ahnikr aigaal k uaed for write enabk of the lyiP 
25 WGSa20. 

ITWCBwrttf fniMffThkdgialcna»aatheFFaWCS470tobew 

with the data previoualf kadad kte the aerki kop at the eddreaa apedOed k& the CP 
adoroaddreaa r«gjk*ar. Note that the CP mieroad^eaa regiatar k uaed. The writk^ into 
the WCB 470 k ^nfified by a kad-WGS maal^ as that o^f the adeeted FPa hmw thdr 
SO WCSvpdntedL 

CP*»wtetg att|;Thk^robegBneratea«ikitetTupt mtheCP. Tfak 
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is UMd l9 tha mieroeocto debug mooitor to fbm ^ 

glf Mm taamat 'nib ^robe w^mtem an intemipi In tha DTP. 
Tltb is used bgr tha mkrocoda debug monitor to fofoa the OTTP to return to tha debug 



5 PTT Intffntti: ™> ^gobe genaraiea an totefrupt in tha DTP. This to 

ueed far the deviea driver to ootU^ the that a coomand haa bean landed into Ita 
queue. 



10 Tha atatua regi^ la read on^ and It la osaUlf mad to aOow tha hoal to 

daterarine tha VMB data FIFCya atatua when tha boat baa aeeeaa to tKi>— 
Ttia ataloa Uta arae 

VMK O^rtiwrt PiroatataierT^>atli>«ii«faittmt^*h^*h^ ptpf^p^^^,^ 

are Atfl^balfftiB and empty. Tbeaa atatua falta are te tha FIFO that the boat reaib from 
16 (if its aoceM la enabled). 

VMB Iftniit mty ^it^y Tl^ ^ thf, yjf^ pff oihat u » 

ftdl hair ftdl and empty, tbeae atatua bita are for tha FIFO that tha boat writea to (if 
ita Mceaa la enabled). 

B» TUa atate btt aDoivB tha boot to detmfaie if ai^ FP modidaa are 
20 preaant T6 do thia It writaa eeeh modide'e ad^eea Into the WCS central re^atar 1 nd 
tUa atatua bit If thara la n oaodule at thto oddreH then thia atatia bit wffl be 
»ft wfll eat. 



WC8 CwtmLBfltfite 

36 Two raglateraareuaadtoeaBM thaWGSInterfiMaa. The firet one contrela the 

f and writliig of tha variaoa mkroc od a maakalija la tha CP, UTB and on the FP 
Mora dataO on the Ametkn and uaa of theae idgnala la badudad In the i 



Ilia eontiol algDala in tfaia regite are: 
30 SffW l99P 9Wtlwrt fmMfr Ttda la the moat ajgrfflcant bit of a 3 bit geid 

that aaleeta which branch of the paraOel patha of tha aarlalloop la to act m the return 
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mfeww^ memory ^idikh needs to be anablee fbr nomal mkmode ^'^^hn and 
mkrocode nadbeck, but diMbtod wbsn loedii^ mkrooode. 
5 ^ ytwrtlmi mitmt tmiMr ^'n Trrnwrnb- pmfhmirt mnhoitiimiat^ Oiu 

FP WG3 470 Is spttt into two benka fbr optimia mkmode loaAig (aa ftisriiMiil b^ow). 
The pmeoi ai^Ml oootrob the pip^kia ragtotera 478 which hiterfime to the oit^ of 
theae two banka. 

FP WCa mod^ Thk oontrola the aerial k)op i 
around the loop aad the tranato or date to^frem theWCS. 

^ niri rrP pimrlhrr rgjgitr i^itptt amAiit; Thi« i« . 

the microoode faiirtrttetkQ and "forcer aD the bita to fo fa^ 

yy WCfl WtWltJBtfte TOa bit output enribtea the date 
mierocode nemovy 220 wfakfa needa to be enablee te notmal ndo^^ 
15 mi eroco d e r e a dhwH ^ bwt diaabled when lo^tBoy ndemadft. A ajnrito <rfgf^| ^M?irtrrte 
DTP WGSSaO'a ot^ut • 



20 



CP WCS motW: Thia oontrob the eerial loop mode and 
erouDd the loop and the trwMfer oC ^dn tfl^frea theWGSL 
BISJBajBfidV TUa ooDtroto the eerU loop mode and I 
dauaroood the loop and the tranate or data t<V&Qm the WCS. 

bim and anafalea the CP mfcronddreae regiater to drive the hue 

'irll^ mfcrTi«<i'fc«« aetee^ Thk forcee the DWa aeqmoer to triatete ite 
but and enahlea the CP miirr oad df bb b regjatar to drive the bua iiwtend. 

aa the ad draaa aouree for the gya WCa NotmaQf the CP mkroaddreaa 
aet up ao that the hoat is supp||di« the mkiooode adtaa to the CP and hence the FP. 

EEJSailriStt: Tbe FP WCS ffluat be treatedaa two hahraa when lendkig 
becBuae of the data routing impoeedbgr the pariaOelknd lboturo. Thia bit aelecta the 
80 lower 64 bita or the upper 40 fate. 

aerial l4>oottetumaiiM^rg);T^<iii>tiJV^ P^w^i^. 



25 
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rf i^PfM^ P|pj Titii rill ]]^| 

from OM oC 4 Hunn. (It muit b« Mi up to Mleet ihai aoum when the WCS eoatenu 
ara Md via Uie aerW loop.) Th* iKmttito Muroes ittA^ 
^ oi4fr); CP eoEtentftl (on thfi faMO board and the FP modulo); DTP; and FP. 

S^tewMafejai: Tbeoe bcU conftrol how the seriot k>o|> befam when 
data le written or read from the WCS data regliier. The optkme are: Ho4d data; Sfatfl 
data; Pulee data. The effect of theee are ctianMetd to the aertol mkrocode load tectte. 

The other regiater holda the fielda to oontrol Uie loadinc and reedfaig oC mkroe 
on the FP modulea. the two flekia to oontrol tUa are: 

Wrfl IfflMl mifffc- BacfaUtofthemaakenahtotheloedlngQrniiaoGode 
toto the oomepondbg module. Aq^ number of bita can he set ao ai^ Ska modulea en 
be loaded with the me mkmeode to paralleL 

Sertol Loop Outptit gnrfJtr Tbaee ere the rematototf two fate tl»t 
tosether with the third bit to WG3 eoBtrol regitter 0 eelect which one of the i 
drirea the CP eaAemid return path and the FP retun path oT the aerial loopi 



Tbm WCS dita regiater the regiater the hoet reatto and writea to aooe« the 
' — T htnrt thn Tnirrnrnfln f iwi n to f k a fa «mW to mf^fcff m im CT d e v>*'«»»g 
eflfciant thto regiater be ha i ca to d Uftr e at wye d apemfln g on how the eerial loop 
Oeld to the WCS control regtater 0 la aet up. 

If the eerial loop inode to eet to niold* then thfe regiater ■ read aul wi 



If the eerial loop mode to aet to 'aUft* then after eveiy read or write operation 
to the WCS data regiiter the regiater to afaifted 16 ptoeee whkfa inaerta the written < 
Into the eerial loop and loada the "hair word to the loop toto the data regiater. 

IT the aerial loop aaode to eet to >d8o^' then the regiater ia read and written 1 
angr otter reglrter, but after the write operation aome control 
I to control the aerial kMp. 



ndai 



I wtikh to to be driven onto the CP microcode 




bus 211B bf thft mifrnrnda load eosUrol logk 610 during microcode loading of the CP or 
FPmodulee. If tlw CP ffikroaddreea aeleetfaft is Mi in UieWCS control register 0» th^ 
reecBng thia regleter will return the laai data written to it; othenriae an aflynehronoua 
anap shot of the address the CP's sequencer ia outputtli^ ia returned. 



s DTP ^tooftdfliriTfw Rfrtatff 

TUa register holda the data which ia to be driven onto the DTP mkrpoode 
address bus 311B hf the mkroeode load control logfe 610 during rakrocode loading oCthe 
DTPnodulea. If the DTP nnooaddreaa adect ia set fat the WCS cootrol register 0, 
then reading this register wiE return the last data written to 1^ otherwise an 
10 asynchronous anap shot of the addreaa the DTP'S aeiiuenoer is outputtlDg 

tba data i4)e concept provides a means for a number of separate nrrrlnrntca 
eub^yatemstobeccamectedinawide varietyoftonninyiwi TMa «MmMtiM> taring 
16 multqile local buasea which are referred to m 'data plpea.' Tt^ ^ n** ti i?n ia independent 
of the harkplaw^, and can be done over a reaaooable ^^-^^tv^ 

In the present^ p r efer red emhodin i mt , each data p^ local bus supports 32 bit 
wide tienalbro at 40 Blbytea per seoond, and is FIFO bulteed at the receMng end. Bscfa 
sub^y^oai ooot^ two input pipes and one output pipe. The output pipe has sepmte 
20 docks^ so wlxen it is daiagr chained to 2 input pipes the date can be routed to eadi faip«A 
pipe indhridtta4f or together. 

The data p^ interlhce ISO is shown in Figure 7. Hie data p^ ou^ntt port 730 
is 32 Uto wide. This port can be connected to the input port (710 or 720) of the date 
pipe inter&ee on another accelerator board 4140 (or to a data pipe interfim on another 
26 device of soaae other ty^h Hm receiviog end of a daU p^ is FIFO bofCwed (taring 
FIFOa 740 and 760). so the output 731 juat electriealbr buffers 

are provide^ ao that one data pipe interfine cu write to two other aub^ystema. To 
prevent data overrun in the l e c e iviug auh^ystem. the FIFO fiiH flags 770 froaa the 
reeerving system are available to the aendsig subsystem for xnonttcring. Two input FIFOa 
SO 740 and 750 are provided for the two hiput ports 710 and 72Q, so two aubajstems can 
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md data to Um OM rmMng lub^fskm. 

IHtt FIFO oulpul ansblM are controtted fay tbt TD loiiite field la the DTP 
aakroeod% and the out|nit itrobea are oootroQad faj the TD deatkiatiaa field. The inpuft 
FIFCVa aUttn a^aala 780 can be ieated bf the eooditika code logic or eaa genente aa 
. tetemqil. 

Uii&9 thia mterOoe atrueture^ onittipte eubqpetana can be ttaked local buMta 
ia a wide variety of topologlee. TUe abflity to do Qt^Ue aub^yatea reooaflguratiaQ k 
particular advaataseoua ia mmhinartoo whh aUbe>ateiaa aa shown la Figure 1. 
aa aHJL.iU»<< ti»onriTe d ma goeeopte data traaafar arcfatocture eaa be vety ad rant^ o ua 
for aiaiv applicBtkna. Sone eaamplea or the topoki^ 

wdar 

SooM algorithBM or agmllia liu Ma eaa benefit from a pataDoi or p^eteed 
effaagemeiiA of amkipte mb^yrtenM, to ditrft>ute the relniktinn workload. For mmple, 
one eiample of a Mgh p e H bti un ee 3D yaphice werfcatatfan oopgguratlon k shown fa 
Figure 36L 

* 

A aaiaj^ daiv chm of several subiyatenia (se ihofwn in FIgw 
to tea data, where the "aiaater^ sub^atem 4KUA aoiiuirea the data ftwa the boat 
aMBBO^r* emaple^ and afaarea it with aA the other aulMyeteBM 4I50B» 41fiOC; 4IQQD 
Ida the data p^ ^w oaa cti ona, Tfak wffl svre on the boat bua 4110 bandwidth, beeaiae 
on^ooeanfaqntem win be fetddag the data rather tint ea& one gettiag ite own eofy. 

The data p^ could be conneeted lata a rh« (FHpbw 35) to efl^^ 

The oontente aad oMaaiag cT the data aent on the di^ pipes k u^ 

The data piaea were dfteiynid fe# iMtoMA^yti^ ^.,»»^Hi tffta but th^ev 
ceaaect to other p eripbes ak . While the auatafaed W rate k 40 Mhytee per second, the 
bur^ laput rate k much higher. The burst Input rate k ftaited fay the «>i*<'tf |>^ ^^^rta 
ortheoafaiBi^ but caa be aaUi^aa 160 lilTtea per eeeood for one data pipe fa^H^ 
whea boUi inpata era peraileled, up to 320 Mfaytee per eecoad with auitafale buffer card^ 

Utedd bereeo^daedthat akey advantive orthk mtertheeeapafail^ the 
wide varkly of sufa^yetesa InterDoaaect tftpolngka wfakh can be taad. Therefore^ it k 




partkuk^ hnpartant to raecviln thai tba Hmpto oooflguratioofl sbown ara aai«^ 
flhiotmtiv* of th« psat OaribilUy which is providad. 

5 TUs tntar&oe aUom cooneetioQ to aa »|TfHf tfrm niilnmliinil but. la Um prem^f 

pref ef r ed wnhodimtn>» thte bus oooneeu to a picturo pfncwMaiu, whkh la part&eukrtr 
op t hnttiwl for ^apbiea and imagt data. In the praant^ p r afar rad ambodliiiaal» Uda 
pietura bua is a X3IP bu%* which haa 160 daU Unea and nina aft a data dock period ot 
120-200 na. (Tfaia ioter&oa logfe la therelbra ratered ta^ in numeftMia plaeaa m the 
10 pc eae nt appliral kws aa the "OIP latarfimr,) Howarrer, other platiira daU hm ataad«h 
emiki (leaa praCMb^) be uaed tnataad. Alftaniath«^» ote 

wQfffc;) or reaMma qfatami). 

Hie dP Inter&ce allows the GIP and sob^^stam to pa« data and caamaadi to 
15 each other. The inter&oe la shown in the faloefc dii^^am in Figure a 

AH communkatko between the dP and the aub^rstem pM throi^ a 16 fait 
wide Udirectiooal FIFO 8ia Ona aide of the FIFO cootroOad fay the DTP ndooeode^ 
and the other fagr the GIP mkrooodew Thb GIF interftoe indndaa a mierooode ^t*-*^^ 
port intertee^ ao the GIF aetuaQf moa mk roeode (8 taita) tlM ia mnWrinl on the 
20 subqrataaL The GIP mi croeode ffipanaion bua ia to tha OTP mi crocode 

eipanate iotartee deacribad eariiar. 

Tte GIF faaftette providaa the stfvieea neoeasaqr fbr tha GIF graplnca prooeaaora 
to run sooaa dfart rtt w it ed mkroeode hi the aub^atem. These services incfaafe the GIF 
mirrorode docfci^ the GIF microcode addreaa tad daftafa«i% interrupt and statw aapiab 
26 and a maana for aeria^|r loading the esteoaioo GIF microcode. 

the principal ccmpcoenta in the GIF interfiM are tfaM WCS 830, the Uifirecti^ 
FIFO 810 (conatrocted out cf uniAwtional FIFOe), the atatitt fegie 820 and tnterru^ 
logic 840. 

The rsiidBQt GIF micT O Co de aOow the GIF to perform the foOowiog ftmctiflnie 
30 Bead or write daU from the FIFO 810. 

Teat the FIFO statua sipiala via the status logic 820 and drive the reautt 



out OQ tte op<n ooOtetor condttktt code liift«rftM dpuL 

S«t up the ffoiwHUncM Uuii will caute the OIP to be Intemtpted (fbr 
enuoapK the FIFO beeoming ftill or empty) taj the intemipt logle 840. 
Genenle an interrupt in the DTP 
From the DTP aide, the FIFO looke like any oC the other FIFOe eacept it la 16 
l>iu wide rathar than 32 biu wide. 

An the detaOe on the ftamat the oonsuinkatioo takeii and oa whether the 
accelerator sub^reteoi or the GIF la the mwter device^ are totaQr decided by the 
mkrooode ntnniag hi the two p ro cea wr a . In the 3D werkataftte envirooBMntp aa ihown 
fa Figure 36l the prefemd Uerarefay wo«dd be the hoal m mMter, the 
the shnro, and the nuneris n D ceietat or sobeyitem In the middle. 



Ona advnoiasBoua part of the ooneiareQt muh^roeeaeor iO«*efli 
1 (and elMwhete) ii a iwfial kwp hiter&oe to the writable co^ 

^ «^ ^^..^^^^ ^ . m,.^ "-r^rcr TTf thh InnpL In thn taraniilb jaufijinnl 

emhe«fiBiettt» la ahown in FiBar«2a (The fine ihown «i 225 in Figureo 2A» 3A. 4C» and 
6 la broken out. fat Figure 28, to flhow one output Inie 22SA and four retm 
225Q 226D^ and 22SB^ 

The inptanentatloci of the kkterflMe to the aerial Vm^ hm heen deecribed above 
with reepee* to the varlMia p rw iaaeo r e indHduaQf* and fa co oa ec tl on with the VBfB 
interfitto 16a H owo Mar, aooe of theee toturee wfll now be r e v iewed i^nlnt ao that the 
hl gbef^ I evet architecture of the eerial loop can be ^t^i**^ mm deertf. 



tff (TftntwLfllflea 




loop kitarfiM provldea data aooeaa flram t6e boat to afl of the control 
the net faeodwldth ofthia iMp^ ench aepMwte WGS (^idiadii« the 
to the aerial loop throu^ a bmik of aerial^pemllel 



Ibe ihndow registem wfakh kitertee to FP WGS 470 are ahown aa 
4StA and 481B fa Figure 29 and fa Figure 4CThe ahndow xeglatera which Interihee to 
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CT TO aao « rt«ma «. ..rfiUt. aaa and aaa ta FIguw 2A. T*. d»*wr 
whleh tot«ft« to OTP WCa 380 «• a. wglito,, aaa «M» 828 to 
Mahtw* whfch lirterfiM to the CP WCS Eitttriao 490 •» 
the CP esteniM logie m Ftgura 4A, but are not shown aepefatelr. 

E«h or the» le^rtor. CMi k»d the tartiuetloM teto ite re.|»et^ 
or dock thetartruottoa atreom tocMBantri^, or <±)A 

•o fiMt ae poarifale. Thua, Um taaadwUth of thli Hne ta uaad eOldent^. and oo^ • 
oilahBal onmhar of faiatnietiem ia required to aoooMoanM 



10 XdaaXtfiBtEBL 

Inthapraaapt^r pceftiTedeiB tiniWnni il , Mne edtStlaad e^iriifflty to provided Or 
of the aarial loopb to pcoHde adaptation to the wide laiMe of 



^ the preeenUr preteed oabodimaat. eeeh aubvptoB «i hewt up to 6 

U mimn'ud e d prooaaeora (ooe eontrel prnriiaaia. ooe Mm i rfiii proeoa 

aa flw lloatfai(.poiaft prooeasora or (danrithm aeoeiBntora). Each oTthaae \ 
ite oem WCa BaA WC8 iBu* be written tOk to lood up nieweodak and be wed 1^ 

Tbm mirin Cartuw wbkh help prcwrfcia thk #fhfflty wt.eA>. 
» A ratum mnlf tpl um. Tlito eoOecto the saw loop tea t«o iiiteniol 
■G^ceo (tho oontrol proeenor and dBte-traMte pt«cQ»or), «ad fhn Um two eiteraal 
'^ctambuaeif Ctetheinfcffoeodoflftheeontf^ ^.-^^ i>TH^»hfr «t*iH|4 e 



i Mriai tRMO lidikh ooOaets tbo MtiftI lo^ 

20 point fgnnww aodite wfam tlio 

■i 



25 poiot pffoooMor nodnloo whaie tlio cooftrol proeoitor ond fkwtlDg^pobit 



loop addrao adeete wUefa modulo drivoo tho miU 

M « mkroeodo load oafalo fait ao 



or nodute can bo loaded 1 
30 'I^<>^toaoArpcooeaK»eerialloop««paiidoDi»c^^ 
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WXh this orgmiwHmi, the protoeols to tnailbr dst* around the mkl loop and 
NM^fctnwrt Into tho WG8 aro quHo oou^Ucated. Such protocob would oormaQf bo h^^i^ (a 
aofiwaro. In tho preoeo% pr e fo rrod frmhorfimM U, tho th&o cooouming p«rta of thtoo 
5 protocola ham boon hnptttnentod hi hardwaro, whidi ^fp^uu^^^ spoo^ tho 
downloodhig of miooooda. Aa ao added benefit, tha Mftwnro overhead hM alao beeii 
rodticod* 

lathapreaenttf ^efiBmdembodtaaenft, tho hoat wriiea (or rottdtf the mftooooda, 
a word ai a thna^ to tho dau roglater. (The dau roglatar, hi thia ombodteent, ia 
10 conatrttctodftomtwounivogaai ahifti^ghtoM-Thifc r mmmmiwy^ *4^^MUi^»mm ^ fyff^ 
shadow rcgfarter^i auch aa tho Ani2d818 made faj ABfDO Dependhig oa the aorW mode 
previoui^ adoetod, one of throo tldagi Iwppena: 

If tha lioUr node haa beoQ aeleetod* then tho <fata tmaite belMM^ 
like aqjr traoate to nwmny. 

15 Iftho -ahift'skodohaa been aaleeted,thmhttitte£ata|yaftflr tho toad « 

wTiu <9clo oida the data ia date la afaifted hito (or out oO tha aerial loopu WhOe thfe 
happenin g a buiy iigiai ddaya fbrther aoee« fa^ the hoot to the <hita regtaler. 

If tho >ilaar mode ia aeleciedi than aboirt 500 na after tho write aooeaa 
the aerial data dock ia pulaed. to aet the ahadow rotator mto tho roqnhvd mode. 

20 LwgTgwfcag 

Figun 28 ahowa tho hvga-aoaia coonectkoa of the settel loop. 

A ahigle output fine 22SA ia driven bf tho nkroeodo load logic 810 hi tho VBCB 
ittterfin 160. (Alternath^t thia doee not Inm to be Qi4f a aii^ pl^^^ 
be a bua in a tw riL o«. a four>fait*wftde buaO TUa fine ia mpsM to eadi oi tho shadow 

25 regiatora at the periphery of oadi of the three writahiB control atorea 220. 
(Koto thai the CP WCS oatansioa 480 ia Mi <firoet|r cooneeted to ^ 
but hiatead Is oottieetod to Ifaio 220C; downatream of tha priawy WCS 220.) 

Four return Bnea are prowdedtwhidi can be a e l e ete dby nmtt^^^ 
return fioaa are prfnarihr useAd for debqnii^ 

80 Note that there is very fittlo *aoaUn^. That ia» there are oi4r two casea ^Hiere 

PlltoBi Anriharinn of DiiPnntPiiaiai^aq^ POaali? 
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the mtkX oT the seriil ilMdow Nglctan on om WCS It used aa input into the 

interte or anotberWCa la each of these cane the WCS wfaidi is doirostreem la the 
aerial loop ia effiMthre^ aa fntenainn of the upatream WCa. lliat ia» aeriea 
QTindepencleai pr oce aaor a ia the aerial loop aregeaaraqf avoAded. Tlio beaeOt of tUa ia 
that tha jad epen da at lakroeode pr og aa aa for diifareBt pr oc ^ aa m ' modutea do aot have to 
be BMrged to«sthar. TUa helpa provamoMra to take fUtt advaatase of ^ 
partitioa oTalgorithau <fiaeuaaed abova. Thla alao beipe to 

ha^ to «roU an^ problem with aiergfaig propaaa aldeh ara togeted (br WGSa with 
dUbreajt widtha aadAw deptha. 

An advantasa of tha paraOafiBB fa& the loop topologf k that paraBel ka^ caa 
eaaq rbaaoeoaqittiftied. Fbr enaipta^ if aaaaaKa aeqaeoaa oTmicr^^ 
loadadhitoeaehof thaFPaiodulaa laOl aQ of the atedowngktera oa aO of tha FP 
aodalea can be enabled aimuItaDeouatf, and each wffl be loaded in afrca ileum with the 
aortal data oo iaa 3SfiA and the mrmndrfrnaaca oo bua ailB. 

Aa laay ba aeen from Flgura % the loop topoiogf inelttdee ondt^ 



CP hrmdi: Output fine 225 la provided aa iaput to the dMdvw lastoter 

, at CP prhaaiy WCS 22a Tha return Ihn the abadMr r«ei«ter IntcrtKe to CP 
WCS 220 CBae 228Q ia fed back into ^^.^HpimT 2710. 

220, ratuni Ikia 22SC ia alao provided aa faq^ to the shadow regkter hiterftm 
the CP WCS KiteaaiOM 4M. Ite returaa from the shadow regtater intar&eee to the 
WG8 aitan stona 400 era afl oomected to cetom Bns 22SD, and thsrafcy fed back into 
2na QEttaoa the returna are mmected hi paraOdi the aerial output 
I are preteab^ <|uafified an ioffividual wwftihf addreas^ toprevantcooteatloa 
on tha ratum llaa 220DJ 

"TP Bn airtr ^'^■*r^ n^^^t^^ . | j^rmHtn a (irrial input 

to tha shadow ragistar interfhee at OTP WCS 320L The return ft«m th^ 
intar&oa to WCS 820 (fine 22GB) ia fbd back hito aadt^iexer 27ia 

™^ ^""^^Tilim tmMinnrh' " ' fm-Tfiin tir irrr 

WCS 320C ratom fine 22fiB ta aba asada acvaitable aa an off-board output. 'VtM 1 
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eu te «ipkitod hf if <tefllnd. to provldo lyiP — i^gu The ^yrrt»f?n <£ 
such logb li (Bund la peoi« detafl below. 

F7 Bmrtr Output fine 226 Ii elM Milafale ae a serU Input to the 
ahedow register ioterfheoei the WCS 470 00 eoch of the numer^ praeMor modulee 130 
or 130'. The returne from the afaedow regiiter iaterteee are eU eooiweted to One 22SB» 
and thereiqr ere M hnek into Bnilt^pleaer 2710* (Oaee the returae are connected in 
pereOel* the aerial output mmmarvH are preteefa^ <|uaUiked bf an indMdual nodule 
eddreoa^ to prevent fnwtwtHoo 00 the retoni Bne 228B.) 

Fignre 27 ahowB peater detail of the eompooente of mto oeode loediiv cootrd 
bile 6ia One important compoMnfc ia the £i4» Ctop 2730^ wfakfa lei^^ 
aerial dotn. When the WGS li ifiatrftuted, eonkioffliv the dock akew betipeea the ihifl 
regiater dock and the ahndovr reglaters' D docka can be very di£Bealti beenm oTthe 
rnnoy <Blftrettt aerielloopi oooflgiiretkcia. The inft i i a lim of tfafa ffipflop takeecar^cf a^ 
dock akew <te long ea the skew deeant egoeeed the bade dock period tiMt Mvee the 
C Mrtg ^ nMn g bgk). State maeUne 3740 providee O dock ootputa^ kt reapoon to decoded 
aignato from the boat. 

Itf&JtatfltfHfttjLBsat 

Aa dtoeuaaed above with regud to Figurea a end 27, the adcroo^ 
logic 610 can read and write data onto the serial loop 225. U can «2»o write and r«ad to 
the CP and DTP mirrnwlilnim buaaaa 211B mmI 311fiL 

DTP Mfcreeoda Kaeanden Lont> 

The prw e enHy preferred wntwttDent alao providee the capabai^ to coeggige a 
aeeond aerial faiterfeee loop^ <Hrt#iniBng tMwmwA^ 1^ iw^rot*^ tir thia hwp in dunm aa 
2840 in Figure aa 

OpdonaQf p the DTP niodulB 120 can be eitended off-bowd, t7 builift^ 
<^rtww < n Q^ •nmoehal analogoua to the CP WC8 ertenainn «0| bUo 
rempenema , theee WC8 eiten dona provide aderoteatruetka> outpnta aa 

TYPP w ^ Qi 10 -1%^ ^-^-^^ ^^^^ 1^ prnrnra^^J ■mnaatial 

looeer Onm that of the CP Bxtendon Lo^e^ dnce the DTP be uaed ia a 
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aoatfiHiftft wttemgeof efifirauaeata* It is oootemplated tlmt theDTP «it«ttiogi logic 
Mj be unfbl te hxteHhoa to dooe^f-eoupled high-«peod U> dovieea. 



If thto wniMM i cMi opttoo » uMd, tho DTP wrtim riooa (if My »o utod) w 
aefieo with tiM OnrP itaetf: nil pr«!¥«iita oonlM^^ 



A« Dotad abov^ tho praeot^ prefomd wnhodfnwfnt pcovklet tSS mahoA of 
teding inimeodft into the ao«thi9.poiiit p r occMo r. dtlur vte a seHal loop wkdar oontrol 

of thohoit»or ia pondM — ■ itrri tf tht nrntTnl ffnm— m '^t imrnllnl ^^^^^ of 

Bierooodo is unM bocwiM tho amouDt 0^ 

tho flootfaepoint pffoooMor io Kmitad (4K or 16K i 

Ooothis^otet procBHQc rootinea to Bt in WC3 «t ouot, am tea of ovet^h^ to 
nupwMfj . U tho oerM loop to IcMd omfa^o to not pmtkoH atnee the boat en lood 
inalnietkika oo^ iriowtr (ftg. 100 mkroaee^ 
aoosaaos) 

Tbo panM load capafaiB^ provided by tha prwawnHf pwterad onibo<gttaafc 
makeauaa oCthavofy wida datacachaaenmy ta *hm mn^mm m fa i Trn wI ft hwt n rtinn 

'r j^ — " iv-** *v flratfng r n in t la nrnaaia — n rcgtolma 

ia om ^rda. TUa ia than tranifbrTwl into tha tfa^mtto ddft 
uaad te tba aartol loa^ br waor of tha nonaat output povt. 

aa output port (wfakh can atoo ba uaad aa ao input pott if da^M) te p^aftdi^ oC 

oaad» baeauaa it ia too alovr. (BCaoor of tha d4pa ngtotor tha "■W- ^'r^ faita intarnal^ 
aojwagr.) Tfato maaoa that tha panOel load routajist deaoibad can maka tiaa ofthto 
c^afaifily of tha aettol paralld f«gtotar% and doaa not inipoaa a^^ ^Modor 
ftaetkoattgr. Ite pavallal load tiaaa to about 800 na par hkitrue^ 
inpiovanMBft ovar tha aattol load ttoaa. 

Another d g iifl a im point ia that tha o v ar l^b^ of mfcrocoda in tha flcatfa^pofait 
pcooaaaor oaa ba rootroflad anHralr I9 tha oontiol prnreaaia, without imuhiug «^ 
— Aii ^a,^ ^ — rh t-ttU^ h nnt nkrniti jr teBi U iu 
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QUI to ttai boii BMrnovy and fetch U from thm, 

la th* praaot^ pi cftried rnnhndhnwnti each cubqratcin can have up to 6 
mkroeoded pr o ne wo r a (ooa control proctawr, one datft^rand^ar procaaa 
aa fbur floatfakg^poinl proc aa a or a or algorithm aooateratora). EMh oT thaae ] 
5 ita own WCS. Bach WCS muat ba writtan to» to lomi up mkrooode, and bo read frtxn, 
far rfl o gin a t i c ^ aettfaig brnakpointa, ato. 




16 



10 



Two typaacf modulea f^mirftirf to tho ^^i^ 144; 

The arithaaatfe prowifaig ^rpob aa tniifiad tho 11^^ 
180, or OQ algorithm or ^rrHi'aHtm rrrnUiiaiia lao^. 

A High ^wad OaU (BSD) moduK t^pfaal^ oaad to aspnd tha < 
aasMty sat to add a Ugh ^md I/O cfammeL Thte mathod of ^-r « >* ^. ^ the 
to vBfy <Sffieraiii from the uae of a biA BOMnofy aubmratem wfaieh intarftDeato 
tha a fio e l e rat O ff aub^yatam irfa the DTP mkrocoda a^aaaloo bm 2824. tha HSD method 
wil airport tha aame baadwidth m tha data oacha maoiory 140^ but 

Thamnhifmodula eooagnratkmano«afarupto4lloath^^pQiit ] 
laa type modite and 2 BSD modulea Theaa Qgitraa ham bean dwaaa far i 
and filiPtrlral reaaooa rather than aagr Hmiting architeeturttl renoo. 
Tha FP 130 niodiifea are aalacted far the module aelectbi^ 
25 mder oontroi of the eootroi rroo aaaor module 110> but the VMB intarfhca can owride 
TUa would ottif ba uaad te doarnloacfii« mkrocoda or during debi^gii^ Hie 
act bita ooafaot owiy aapect of a aaoduia'a op a ff a rt oo a»cept far raaatthy (wtneh 
ia nontroMiid tqr tho met ai^iaO. 

Tha HBD module ^ ■o i * *«* « ri A ^^oi Wi g ^ 
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Tbm tfo pn^ ctkft i to thm moduto are nmmaHsed below. The U«t poupo the 
kxto logicel area ADd identifiee wfakh of the two types module would um 



Fisuret 4QA and 40B 



Tbft coimecttona are made via ris 98 wigr DIN < 
the pfayikal ccnaectton eooflguratko of the ] 
In the preeai^ pretered embotfimeol 

for Date caeho tranatoK 2S6 bite of 38 UU oC DGM 
Wf4le enafalt^ a Holdh« Begiater OB bit, a delived aeeeM d9ial(to 4 
o^ a immi i Mwtol e dow memoviaa), and a Holcfinc Begtater GR bit; 

fbr CP iaterftee: d bita of CP addrea^ 18 bila of <kt% 18 bte oT CP 
a«|iMcer addrea^ the CP microcode cloe^ 
ctek, ooe intemqit liai^ and Qoe CooAtlaa Coda; 

for mlcroeode loadiD^ aapeiate biea for CP WC8 ou^ut e&abl% CP 
utput eoaMe. CP WCS write «mb^ CP ^Mt^ f!P mmM ^ ^ 
i Id, FP VC8 oat|Mi eaafalt^ FP Pupate oii4»t ei^da^ FP P4>^ 

da^ FP Modo, FP m i rrn a d i l iiM select, FP lypw/lower WCS aeled; FP aarM 
LOUt»FPSerialiii,aawdaa a lis bit Serial doek/WCa Load Maak and a 8 

bit Serial Loop tetm aeleet; 

ffBDeraQf oaoail: 8 bita oT Bfodiile loloct, ai 
> ite^ F^ niQ. FP B^ealqpoiot, Mkrocode Loop^ FP Beeet;, 

Hbm: 21 Bdoo for -I- 5 Volta, 8 aaca for ^ VQlt% and 181 Qf^^ 
Bvetjr module ^jfpe baa aooeaa to al the I 



the 

:iB 

inthidaa e CP 



in Figure 10» ooe vory uaeAtl dmm of ( 

>dtdoo lAft- Tii *him m^u^^M.,.^ ^^^^ mnitiioo 180 i 
by e oontrel prnrm c ca modnle 110. The CP module UO not on^ 
but atoo direetif Qontrola all data tnm^re to and 
ISa AB of the mnaerio procetaor modnba ar« 
to acHdw bua liC BMh of the numeric proomeor modute 180 
n Logic 4101 m < 
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120 mrnii gM data tnnftn between the cBeho 140 and the outiide wort* as diMuaeed 



The vefy high memocy bandwidth between the data cache memory 140 and the 
numerie pfrxt m m modiilefi 130 or 180* wffl» in maqj caae% aOow a number of modulea 
5 to be worUag In paraUei without suflMttf daU atarvatko. 

Tbm number of niiwe r k prooeaaor moduiea that can operate uaefl% dependa very 
mucfa on the applieatka or akforlthm mis. In the preeenttf pretend embodfaaent, thte 
hai been fimlted to four. Thk limitation haa been li^oeed pcimarOf for ttocUkal and 
^^'^^"^^ reaeooa However, onee aB the memory bandwidth haa been there ia 
10 no edv ant aga In Increaatag the number of floath^-poiit pro 

Sinoe the nmaerio prooeeeore run eutoaomoiii^» the m^r^Tl fr 
to Indude any protoooia for fl o atfa g p o to [enriMui to 

r data endmnge. TUa heepe the faiterfocea very aa it 1 

the need for arbitration. 
15 Pteferab^yan ipgtnictioa write bua im Amr^^ by ^ 

ntrtnmitiwi proeeeMra, P re teab tf tlie moot ajaiiificMii aAiwiM iam Aii4^ 
to loglB fluch that any one oT the ntimnric amVia ^rrfrtrt ii^ 

mt ail) gmupa nf thran prnrr^ao ran ^o wlrtrnamd luqiUim- 
20 That li^ the oootrol oT mnttiple flontfa^potat frnreMnio neeito to 

; the foot thet an a|0Qrithm might run on aqr one cf the 1 

I of it might runottioaieoraBoftlMfioethig^oint ; 
TUe aaey require defining n long term or diort term control rektiooi^ ' between tlie 
oontrol prooeaaor and the floating-point proceeaor. The oootrol i 
26 cycle by cydebaaiawfakhOoatinr pent proceaeor to c^ 

for a longer term relatloo^ thia ean be defined more tfobaQf. 

in the preeent^ preferred embo^ment, tfab k achieved by uifag a mierooode bit 
that ao i e cta on a per bada the oontrol i~K«mi-« whkh definea which ibeticg- 
point p racimm to ttae> Tbe control mechaniMa can be either the uae of other miaocode 
50 bite, or the nee of the eontente of a ? n g bt e r (wfaiA would have been preloaded by the 
e). Tlie bita hi the microcode hwtnirtlnn Add can be uaed for definidon In the 
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i t«ffau Ul oa « pv eyd* huiM, white Uw rogtaUr dtlliiM tb* lasg term i 
FiimplM or tlM u» of the two mwiet nrfght bt: 

Short t«rm • When doing an Vn with 4 floaftli^-point pcoo en ori Uw 
control p co cw ot wiQ cpoad m few CToleo wiih om flo«tii^polnt jwui wiit, Xoadisg tho 
9 oext butterf^B di^ od eolleetiiig Um previous 

the not Oootlae^oiiit ii i nii i B M ur to deid with anothor buttorOf. 

iMg torn • Wboa doing a voetor add tho flotttfasflhpoint proeonor to iM 
io oelBcted boAre tho vaetor odd roiithia Qn the oonftnl proeoooor) io celML TUi mmtm 
that the cot^ p r o c eieo r doeen't need to laow wfaidi floathvpofait procowir (or 
10 oTfloetin^pcfat pfoceeeor) la bdDgueedtodothecria]lelkB& 

» 23 edwnnticnl^ ihowe how tike aodnle addreoM are decoded. 

or tUa decoding la iHwu ieed in ooonaetloa with tike FP aodide 180, 



16 




Ae Bi onit otted «bov% n large amonnt oT *t^->«*'-> nkcnoiy en be 
to the cadhe hue 144. Thte ie a flirthnr MtiiMH^jii iif Hm jilgriiel 

protoooie need An ennple or aiKh n etroeture ia ahoem in FIgUM 4& 



20 

ttA and aOB ahow kegr tatiM or tile pIviiBid l^TM 

Figure aSBia ft dMighter board width lanaBer thn thai 
boerd or Flipm aSA. Figure 38B providee the hardware fbrn llontilq^poittl 
module 180 Onefaiifing the eeeonpaaTint eontrol preeeaeor estenaion logi^. Flgwe 88A 
20 oootaitta tiie datn tranite prooeeeor 120, tiie primeiy portion of control proeeeeo^ 
Uie data cache inemory 14^ ttie conmMkd memoiT 100. and tiie hiter&M 
* and laa The two boerde tofsthar providB • oompMe ^yetem like that shewn In Figure 
1. " 

The two boarda hnve an idntleri pnttom or wa. connBetora d8ia Sfaiee theee 
80 wun w Oo re are mah/female^ uore boarda maj be atached together. For CMBpie^ the 

in Figure 9 and Figure 10 maj be achieved by i 
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130 and/or dgpMm ur ee toi itort laO' together. (Ho««v«r, for 
iwfitottL it it cnririrmhttftd that it nacr be mof ^AMfatyjoiM t« tM^f ww^^^^ 
ftv tbeae oonDecttooa. TUa would gtve a more rfwiMiniimi cooflguratko.) 

Tha cdonaeUm 3810 m prsteiOi^ met 96 piaa wida. Tta% atthm^h the ftdl 
wklth or tlM cadM iMa 144 U mitad tfam^ Uiaaa eooaec^ 



to anlarga tho daU cacfaa mamoiy 140 mi^ alio be i 
uifaif tUp pattern coonectocaL Aa noted afaove^ •^• ^tH i al 
MOflfy CO the oadM bua 144 providea a r^attvalr tega 

a daloor over a vety high bendwldth dMaaaL ht the pieaenttf preteted 
qp to 12 Mbytea can be acc ow o d, withfa 100 n» at 340 Mtyt^batL 
^ 38^ ihoara the locatiooa of tha targHt hkAvidul . 
theginaralalnfMiiuiiaoineftinetkipa in other iipiM thmhtmrAnmmA 
k a tri^le-hdghEt Bimeard Ilka VlIB hitefftee ^ 
of tha boards to minlndae boiAplBaa atid» length. (Tha VlfB 



Tbe oHflMtT banks 610 are genarallf located near the ooaneetoca 3810^ at the kit 
ihownattha top of the drawing. The eominaad flMOMty IMaadVMB 
680 are ako located in thk Mo. 
pertko or the center or tha board k taken up wifth the CP hoUig 
nd the hokfiiv regktefo 660BL 
Tbm OTP and CP IPUa 340 and 340* tha OTP and CP nfqimfrarB 810 ml 310^ 
and the CP adftaaa ganerator 230 are an aaparatetf flhown. 

Tbm OTP oBodule'a wtitabie control store 320 k generaQf ahoim bdow the 
3810 near the bottom left or the dmwii^ and the CP module'a wrttafate 
r 220 k ganaral br ahown bekw the ce nn ec to ra 3810 near the bottom right 
The GIF fnterfhpe ITQt and the OTP mirmrode ^■irnrffii interftce IBO^ are generaQf 
at the bottom left comer* CThk aren ako oont^ some DIN ooBnector% not 
provide the pfagrflical eonneetkn widch thk kgk k vraikbte 
bottom right comer coBtafaa not otttf the data pipe hitertee lOOi butako 

Uai 
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The dau^^Uer board, ahown in Figure 38B» im amaUer. (Figurea 3aA and 38B are 
not drawn to the same scaleO 

The holding regiaters 420 are nested between the coonectora 3810. in the areas 
shown top right and top lea. In between these registers is an BCL neighborhood 3820, 
6 where ECL parta (which tend to have high power '*ifmt;ratifm) are located. (In the 
preaenHf pr ef etf e d e mhoftimen t, the £CX parta include the transfer elo^ generator 412» 
and the FP microcode dock generator 480.) The isolation oT these parts afao helps to 
mtntmise the injection oTTII^ noiae into the quieter BCL pstrta. 

It may be seen thai the chips used to construct the Register File 430 are large, 
10 as are the ALU 460 and ntulUpUer 440. (In this emhodhnent, eadi of theee diipa Is in 
a pin-grid pedcage.) 

The FP module's WCS 470 is generaQr located in the left middle portion ^the 
Figure. Just bdow thb is the FP module's next-addieaa logic 477. Note that the 
scratet^ad memory 1610, which the FP module's control logic am aiao uae fixr a 
16 is phyaicaQsr doee to the nest address logic 477. 

The CP erten ai o n logic, which ia used to extend the CP microcode for control of 
each of the daughter boards 130 or IdO*. is large^ located at the bottom edge <tf the 
board as shown. In particular, the WCS egpmwion menuny 490 is ahown at the bottom 
left. 

20 ft la particularly advantageous to separate the floating-point prb^^ 

a separate aubboard. (Note abo that, if nuaaerie pr o cessor TnMiiIrt are uaed, 

each p ro cess or module 130 Is preferab^ Isolated on its own respective subboard.) The 
numeric pr o ces aor modules 130 are particular likely to generate Qoiae» since they 
inchide much high-epeed logic* and they iare also significant^ susceptible to noise, since 

26 some of their Hens and compowmta use ECL levels. 

BCdreover, note that the holding registers 420; the local transfer bus 422, the 
register files 430, and the transfer dodc 412 are all located on the subboard. This is 
advantegeousr since the higbest-firequenGy lines are all isolated on a r^nmnu^ subboard. 
This is partieulartr advantageous in onbodiments iising muU^ numeric pcoccmor 

30 modules, sinoe some degree of Iwototion among the various patches of very high-speed 
logic is thereby provided. 
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In the present^ prefemd embodfanent, the following PALa (prognauned logic 
arrays) are used AH of the PALa present^ used are TTL. Meet are from the 16 and 20 
5 aeriea, but a fow others are also used. 

However, it win be readi^ be recognised by thoae aUQed in the art that a wide 
variety of other io^lementatioQs could be used instead. The diviskm of fttnctions mto 
hardware bkcka be changed, and the hardware s«tpu«»>^***<*^ for a given group of 
functions can also be changed. Many of the functicms present^ embodied in PALs could 
10 be implemented using MSI logic parts, or as blocks in an ASIC or semi-custom integrated 
circuity or by programming VLSI logb ch^ However, this tmptementation is given in 
great detail here to provide full disclosure of the present^ p re fer red embotfinwnt, to 
ensure fUQ compliance with the patent laws of the United States. 

15 Cf PAU 

Following are brief descriptions of sosne of the OMSt important PALs used in t^ 
control pro ceas or module 110* 

Oock Waveform Generator PAL 2SQ 

20 This PAL generates the timing waveforms used by the CP and the DTP. As 

<fiBcussed above, four docks are produced. These each fdlow one of 4 p rede fin ed 
wav e fo rm seqiuencea. Tbm 4 sequences are characterised by diff er ent periodic name^ 4* 
5, 6 and 7 times the oiput dock period. This translates to 100, 125. ISO and 175 na, 
when a 40 BIHs osriltotDr is used» as present^ pre f erred. Tht microcode dock and the 

25 pipdine dock hacve identical waveforms, but the microcode dock w 

the pipeline dock running, for microcode loading. The microcode dock is always high for 
2 qrdes (of the osdOator). and then islowfor2;3,4or5 cydea, as selected by the cycle 
length inputSL like cyde length is choottk from the masimum requested by the CP (2 bits) 
and I/TP (2 bits). Since the cyde length is driven from a pipeline r^ciater (although it 

30 might b^ter have been desi^Qed to be unregistered), the cyde length is sampled at the 
last possible moment* to give the mawminn time for it to propagate around the loop. This 
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tmung is inm critkal than fii^ appears* bec^^ 
immediate fcdlowing that in which they are generated. 

Tlie wnte-enable gate signal goes low one qmto after the microcode dock goes 
high, but returns high I cycle befbr« the microcode dock does. 
5 The tfanea*two dock runs at twice the frequency the microooda dock does, and 

its rising edge occurs at the same time there is a the microcode dock edge. 

When the write-enaHe gate signal is low, an input from the VME inteHace 
menuny 660 is sampled. If this input diows that the memocy buay, the cyde length 
win be extended until this input changes. This allows a safety margin of access time for 
10 memories wfaoee access time may be stowed by acceaa dash, oflboard commumcntioQ, etc 
(The buay siffoal, from the PAL's viewpoint, an^ insM extra i^des when the write 
gate is km.) 

Another input selects whether thedodu fi^erun or are single stepped. 



15 CD Bus 9mirf» P AI. 

This PAL decodes the CP microcode bits that select which source drives the CD 
bus 112, and drives the output enable lines of the appropriate device. Wh^iever any 16 
bit source is delected (such as address generaUff 230), this PAL abo outpuU a ai^al to 
activate the sign/tero extond PAL 2ia When a reset signal is active, oo 



Similar PALs are used to decode the data source field for the TD bus 122. Ttm 
PAL which selects the TD data bus source also containa togie to gate the FIFO read with 
their eorreflpootfing FIFO onply status signals, to prevent the reading of an empty FIFO 
(which oouU cause errors within the FIFO). 

2S q[f T^^fl TWi^^ y^y^ 

This PAL decodes the CP Tn ir m rodft biU that select the destination for the data 
oa the CD bus 112, and drives the read enable ]ine(s) of the appropriate device. 

Similar PALa are used to decode the data Hft«tif%fttift« hits for the TD bus 122. 

Wbraever the source or destinatkm device has diip enable lines which must be 
30 driven, (e.g. the memory in VBCB inter&ee 160, or in rommand memory 100), the 
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PiaAqxt AnntirwUiiii of DuPopt Piari arrtqini TlA ^ fiaae 149 

respective dup enable Unea are driven. 

SiaVZm Brtm4 PA^f 2X9 

This PAL performs a sign or xero extend (Unction, depending on an enable signal 
5 and on the high bit of the source data. 3ince the PALa preferab^ used are on)y 8 bits 
wide, a pair of them is used for eveiy sign/sero extend operatioa. This PAL is used in 
two places: one pair bangs on the CD bus 112 (shown as block 216 in Figure 2A}, and 
one pair (shown as block 316 in ^gure 3A) hangs on the TD bus 122. 

Hie bus source iogie provides an enahie bH to the aigo/sero ezt^id logic 216, 
10 when a Id-bit source ia being firrnrinftft. 

Figures 14A and 14B show the structure and operation of this PAL. More 
predady, Figure 14A shows a slightly different embodiment, where three ei^-l»t 
muttipiezers are uaed for each sign/sero extml operatioa. This permits singie-fayte 
sources to be used, which is not possible with the present^ preferred embodiment. Ftgure 
15 14B shows the command structure uaed with the hardware of Figure 14A. 

Muhiwajy Branch ^ff^ft PAI, 2U 
TUs PAL is used to in^>lement the muttlwaQr branching cl^>afaility of the 
sequencer jlO. lUs PAL takes a three-bit condition code and inawts it into the least 
20 sipitfimnt three bita of the microcode constant field. The modified constant field ia fed 
back onto the sequencer bus 815. A shift field input contrds whether the result is shifted 
0. 1 or 2 places left (Le. multiplied by 1, 2 or 4), or whether the input constant fidd is 
routed through unchanged. Another input enables the tristate output drivers of this PAL. 

25 As shown in Figure 3A» thsi PAL is ^efierabty connected in parallel with a tri- 

state buffer 318. Oafy the least significant 8 bits of the constant field are routed throu|^ 
the PAL 317. The most signiflrant 8 bits are routed through the buffer 318. 

(Preferably the mocfified constant field ia used with a relative sequencer 
instruction, but it may alternative^ be used, with care, with absolute or indirect 

30 instructions.) The muhiway branching operation is discussed in greater detail below, in 
connection with Figure 30. 
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Fit ml ftlH]irf<»nn of DuPoB^ VhuA a^^mnm. Lfai , BHESJfiB 

Pfltfl lm\ Condftten Cylc S'Mftrt PAL 

This PAL (located in the lyTP niodule 120, and shown aa muhlpteser 312 In 
Figure 3) oetocta a aet of FIFO status codes which can be tested by the DTP microcode 
S aequenoer 310. Tbe selected set b encoded, and provided to the sequencer 310, to permit 
miiitlway brandling on these condttirma. The source for these status signals can be 
selected to be within one of four bus input fntprOrftfir the GIP inter&ce 170, the two 
input ports of the data pipe inter&ce ISO. and the VMB intorfhce 160. 

10 PTPwri PAttf 

Following are brief descriptions of some of the most important PALa used In the 
data transfer prortaaor module 120 and in the intertee unita 160, 170, 180. 

VMS AMrwg dfffflh P ALfli 

10 One PAL decodes the least aipiifleant btta of the VMS address, and the 0 address 

modifier bits, llie output goes active whoA the VBAE address and address modifi^ 
the previous^ sele ct ed ones. (Up to 15 address and a ddress modtfW oooibinations can 
be programmed, and one of these can be selected fay a 4 bit switch signal) There is also 
an input from a VME interrupt PAL, wiikh indicates when an Interrupt a c kn o wl edge 

20 cydB is in p rogr es s. This b OBed with the address decode to drive the output. 

A similar PAL decodes the most wigmfirant address bita (IMl) of the VME 
address bus. On this PAL, on additional liqut sel e ct s witether tlie top 8 faita of the 
address are used or ignored. 

25 PMA fWQ Status mi CtedK Cfflft9l P All 

This PAL c ontr o l s the routing of the dock and status signato from the DBCA 
FIFO 67a TUa PAL also controb wliether the clocks of these FIFOs are under the 
control of the DBCA controOer 640 or decoded from tlie VB1£ interfile. 



30 



Tte read decode PAL decodes tiled possible read sources from tlie VBIEbus. The 
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intemal VME addresaea are decoded and qualified by the data strobe, write enable, and 
board adect signals. 

The write decode PAL decodes the 9 possible write sources from the VME bus. 
The internal VME addresses are decoded and qualified fay the data strobe, write enable^ 
5 board select, and a VME write enable signal VME write enable signal can be used 
to ooiktrol the settip and hold requiremenU of the various write enables or docks, 
independent of the VME bus thttingB. 

ptow access timing PAL 

10 This PAL generates the tindng for data tranafer adknowledgenients in the bus 

cootroite 660. The timing can be taiteed to the register ot meaary that is being read 
or written, because essentially the same addresses sad qimBSera which the deoodsr 611 
receives are ate izqiuts to this PAL. Another input defays the taung if the dual ported 
VME memory 660 is busy. Of this occurs, sn extra cyels is ate inserted after the bu^7 

16 signal ends.) 

Another input diriays the timing if the serial loop is buay shifting data. ^!Vben data 
ia written to the serial loop rei^ster 680, a dday of three pyete is inserted after t^ 
write enable signal goes high, so that the serial loop state (in a difibrent PAL) 

has time to hUcfa in the data. 
20 This PAL ate gene r ates sn enaUe signal whraever the VBIE inter&ce memory 

660 is b^ aocesBsd. 

The PAL implwnents multipfeaw 2710 and state machine 2740. The state 
26 narhins ftmction ia c on ne cted to control the 818 smd/paraUel registers at the intefffooe 
to eadk of thft writable control stores in the serial loop^ (These inehid^ 
and tte OTP WC3 320, and ate a FP WCS 470 and Ca^ WCS ertension 490 on each of 
the processor modute 130.) 

The state marhiiw cootroto a shift register and a smd data dock. When a data 
30 transfer to or from the shift register b oc c un r iu fe the shift register and serial data dock 
are controlled as a Amotion of the access type (Le. read or write), and in accordance with . 
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a mode sigBaL The serial loop mode sigDal apedfiee one of three aocen modes; 

DATA HOLD (00): readAvrite like a normal register. 

DATA 8HIPT (101: read/wrfte like a normal register but then ahift the 
data by 16 bits around the serial loop while toggling the serial data dock. 
5 DAT^ PW'^ n ..^^i »^ thn inrini 

data ^^fftck onftft. 

In the DATA.SHIFT mode, the state machine oootrols the shift register* so that 
on one cyde it shifts and oo the next it holds. This two cyde pattern repeats 16 times, 
BO the eontoita of the shift register are inserted into the serial k>op. On the hold cycles 
10 the serial data dock is asserted. WhUe the data is being shifted, a busy sipial is active 
to hold off any flirtherVBCB aeeesses to the shift register untfl the shifting is finished. 

In the 0ATAJPULSE mode, 300-400 ns after a write opentkai, the aerial data 
dock is pulsed high, ooce. This pulse loads up the internal fUp^flop inside the *818r 
shadow register. CBadi of these shadow reglstera eotitdns so internal flip-flop, whkh 
16 ccmtrols its operation mode when haektoadfay data into thm «i>«N^[wmHWig 
WGS.) No data is shafted around the serial loop wfam tin D dock is pulsed. CThe ddi^ 
aOowa data to stabiBse, L& to percolate around the kiop.) During this oprntion a buiy 
si0Ml is activated to inhibit aqy VMS aocesses to the serial loop. 

Ihfii PAL also contains mult i p l eier 2710, whidi coOecU the four serial loop 
ao return paths 225B, 2200, 225D, and 225B, and ■p> «jiii* i>p<*.>i*iw*g fl^Oop 272a 

PCM ffri PCM Iff 

Following are brief descriptloQS ^ some of the most important PALs used in the 
data cache monoty 140, and hi the CP ErtenaioQ Logics located m the FP module 130 
26 but controlled fagr the CP module 110, whidi handles the data interCbee to the cadie 
memocy 140. 

PCM Address Decode PAL 
This PAL, together with a multiplexer, Is shown as block 560 in Hgure 5. This 
30 PAL decodes the data cache memny address. Two address inputs are i»rovided: input 516, 
oorreapoods to beta 19-25 of the CA bus 111, and mput 517 corresponds to l»ts 19-25 of 



1304509 



the TA bus 121. A control line S21, generated fay arbitratioa logie 536, aelecte whidi 
address to decode. 

PCM Holding Rafltoter Conti^J PAL 
6 PAL (controUed fay the CP and DTP microcode streams) generates various 

control signals tised to contr^ the three banks of data holding registers 660A, 660B, and 
42a Microcode bits are deeoded to drive the ciock and output enable signals. The irf g~*^ 
to control bank 66QA are controlled fay the CP access signal 636. The signals to control 
bank 6dOB are controlled fay both CP access sipud 636 and DTP access signal 537, 
10 because the DTP port has a lower priority. 

The signals to control register bank 420 (the FP holdii^ registers on the ftr^*t^Q 
point modules 130) are ANDed with the app ro priat e module select signals. All the dock 
signals are qualified by the write enable gate dock signal, to control the timing of the 
positive clock edge. 

16 Another set of sipials can disable the memoiy output on access cydea. This 

allows the holding regis t ers to be read back without writing into the data cache memory. 
(These signab are similar^ used in another PAL to get access to the write mask 
infmnatkm.) 

20 PCM Write Flag Register PAI^ 

Several PALs are tised to inqdement the write mask logic 530 (inducfa prtyvides 
an 8 bit write mask signal 612 to the memory bank 510.)Tbe PAL c o nrespoodlng to the 
DTP interiace registers 660B wiU be described first A similar PAL is used to track the 
status of the other register set 6aQA, ^idiicfa is accessed fay the CP module 110. 

26 The purpose of this PAL is to remember which of the S F.words in the holding 

register 560B have been written to by the DTP. When a data cache memory write ta 
required, the outputs of this PAL mask the parallel write from the DTP holding registers. 
On^ those F^worda that have been updated are actuaQy written into the data cache 
memory bank 610. Whatever a write to a holding register occurs, the corresponding flag 

30 bit is set within the PAL. The flag bit to set ia decoded from the DTP address under 
these conditions. The flag bits are cleared on a data cache write. However, due to the 
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pipelined ope r ation, the DTP can write to the holding register 560B on the same cycle. 



la this case the flag bit would remain set.) 

In addi t i on ^ all 8 flag bita can be set simultaneouatsr (In response to a microcode 
command). This aOowa blocfc writes. A reset signal dears the flags. The logic is 
5 oompletefy sjynehronous and is docked by the microcode dock generated by clock 
generator 25a 

Another input signal enables the read back mode. In this mode the state of the 
Oag register can be serial^ output, via the two least signiacant bto. The microcode can 
read the flag bits in the two least sipiifiRant bita, and» by swapping with the other flag 
10 bits, the microcode can read afl the flag bits. The DTP address selects which oC the 3 flag 
bits are to be swapped with even fls^ bita, and which with odd flag bits. 

FP Write meA PAL 
this PAL generates the write mask for transfers be tw een the FP holding register 
15 and the data cache memoiy. The parameters that control the mask genoratkm are the 
number of F.worda to write, and the F_word to start from. 

FPPALa 

Following are brief d escript ions of the programmed arnqr logic units (PALe) used 
20 m the FP module ISO, in the present^ pref er red t 



This PAL <|ualtSes some of the sipials used to load microcode into the FP 
module's own WGS 470, and/or into the CP module's extended WCS 490 (Le. the WCS 
25 portion keated on the FP module)* with a module select sipud. 

Host-source Module select PAL 
TUs PAL compares the module address inputs from the host with local switch 
settingi, to see if this axxhtle has bem setected. One set of inputs enables WCS loads to 
90 occur. 

A one-blt-p^module address is provided. This permits WCS writes to be 
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independesitJt^ controlled for aU of the nao du l e s 130. (By coatrast, as discussed above, <^M a 
a occg s oa use module addresses having fewer bits than the maadmum number <tf modulea. 
so that not ail comMnaOons of modules can be selected.) 

As shorn in Figure 28, tliepreteredtopQlogsr of the serial command loop la such 
that two loop portions 2840 enter each numeric processing module 130: one portion 22fiA 
to provide input to that module'a own WCS 470, and one portion 228C to provide input 
to the WCS f i rtfflifthwi 400 on that module. Thus, two separate output commands are 
provided (and liirther (|uaUfied by the module address), so that the WCS 470 and the 
WCS extension 480 can feed their outputs onto the oommoa return busses 225E and 
226D (respecthre^), which any of the modules can drive when selected. 

CP M9jyk 7^ PAL 
This PAL CQoqnres the module address selected by the CP agunst the loeaQf 
stored values. If a match is finmd, then four outpuU are saserted. Two cT tbese outputs 
enable the omtrol sipials to dock or output enable the hol^ registers 420. Aaotber 
output si0aal drives an I^, to give a visual intfication of wfakh ^ 
selected. Tlie final ou^ut enables a condltioQ code bit Cseiected") which to returned to 
the main board. Tlie final output is the qualification signal to most of the logic controlled 
by the CP ert ensi oa mkrooode, to enable the actioQ defined by microcode fielte (or 
registered vsfaies) to take place. 

Two PALa are uaed to control the WCS 470. These two PALs are located m 
sqtfvate areas, but they are interlocked togethw because both afifect the WCa 
the first PAL is uaed to control the Instruction Register (which to used for mkrooode 
cofnpnrtinn ss deserS)ed above). The other eontroto parallel microcode loading. 

The first PAL controto the output enables of the two RAM chips in the WCS 470. 
La the present^ p re f erre d em fa w llmrnt , the WCS 470 to configured using two BAM dtips, 
to provide a better match to the inter&oe register set 420 for parallel loading. Since (in 
the present^jr p re fe rre d embodiment) the cache bus 144 to multiplexed down to a 64-bit 
data path into the FP module 130, the division of the WCS 470 into two portions 
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provides a better match for parallel loading of ttw microinstruetlona (which in the 
preaenttjr preferred embodiment are 104 bits lon^. 

When the host is loading microcode, the instruction register Is disabled. In this 
case the output of the instruction register is alwoys disabled* and the RAM outputs are 
controlled fay a signal which is generated by the host. 

The second PAL perfbrms two separate fUnctiaoa. These are combined on^^ to 
adiieve hardware compactioQ. 

Ihe lirst Amction is to control which bonk ot ^p^x» registers 476 to 
enable during the readhig of the WCS by the host 

The second Amctaon is to adjust the transfer length. La. the t»«mK^ of 
words to transfer b et w ee n the holding re^fistefs and the regbter file. 

There are two pipeline register output enable si^ials, and they m never active 
at the same time. Clheae signato are used to oiable the two banks of register 476^ As 
di a cu sa e d elsewhere, this structure coiresp un di to the two banlbi of Wra 
of the proline registers to be output enabled, the buay sipiab must be i^ 
module select and FP pipdine register output control (firain the host) must be active 

The traute length field is coded so that 1 represents one word to transfor, 2 for 
two words et& To spedQr eight words to transfer, 0 is used. Tlie traufer dock genmtor 
(port of the cache bus intertee bgie 460) needs to know the number of tranrfer cycles, 
and this is the number of ndnorcydes -t^ 1 (for pipeline startup). The number <rf i 
cydes is a function of the transfer lec^th and its start poehkML 



Two PALs are uaed to control the handsfaaUng logic The first PAL performs two 
26 in d rnwi d wit ftmrtjonar gW, it eontonhi tK>> hWtwt«i»>w;«g H>^-ftm thft CP and FP module 
ISO. On this Amctkm, it implrmpnts a state rnvfaine handng a state diatom as shown 
fai Figure 220 Secondly and independently), it also controls bank selection ^Hien the 
register file 420 is used in a double bufiiared mode. (The {ffindples of operation of this 
mode are generaQjr shown in Figure 20, and are (fiscussed above.) 

The h a ndshakin g state macfahie imficates the CP is to wait for the FP by drivmg 
the CPWAFT ou4Nit HI If the FP is to watt, the h*«wWK«vw^ state machhie indicates 
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this driving FPWATT HI. 

If both CP WATT and FPWAIT are HI, and the CP ia the first proceeaor to assert 
ha dooe sigoal, then the sequence is as fbUows: 

X. When CPDONE is found to be HI, then FPWAIT is driven LO. 

2. CPWAIT stays HI and control remains in thia state until FPDONE 

goes HL 

3. When FPDONE has gone HI. CPWAIT » driven LO. 

4. Both CPWAIT and FPWAIT signals remain LO until the corresponding 
DONS signals are disaaserted. 

Tbm above sequence b duplkated, with the roles reversedtif the FP aaserta 
FPDONE first. 

If CPDONE and FPDONE both arrive at the same time aejM both first 
samfkted HI on the same dock edge), then both CPWAIT and FPWAIT go LO together. 

The bonk swap side of this PAL is separate from the CP/FP just 
described. Tte two input signato that control this state smdiiDe are SCPBANKSEL and 
FPSWAP. SCPBANKSEL specifies how the CP wanta the banks to be allocated when a 
swap point is reached by the FP. When the FP reaehea a swap point it drives FPSWAP 
active until the sw^ point has been passes^Note that the swi^ pointa are iy^v>iir* M ii^ 
by the CP/FP handshake logic. At the FPSWAP point the state oT SCPBANKSEL is tlie 
new state of the BANKSEL output^ and outside the swap point the BANKSEL state 

Hm tat PAL runa syncfaronouabr to the FP, so another PAL is uaed to cloture 
the CP-qmfaronised signals which tedicBte that the CP module 110 has finiriied» or that 
it wants to swap banks. 

TUfl seeood PAL is g o ver n ed by three handshake xnode biu (sutject to the 
module select signal). The three handshake mode bits are allocated aa follows: bits 0 and 
1 are encoded to implement the foDowing actiooa: 00 No operation; 01 Set CPDONE; 10 
Osor CPDONE; 11 Test mode. Independent^ of this, bit 2 request that the regster 
banks be swa«9»ed. 

The CPDONE state remains unchanged acroes microoode cydes^ unless the 
instruetioa is a set dear operation. 
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Tliis PAL can detect a positive edge <m bit 2 of the mode field, by comparing the 
new input with the prevloua^ registered version. When the edge is detected this toggles 
the state of the bank select output, 

5 Interrupt Cantu^ PAl^ 

Another PAL is used to c^ure dock edges on CPWAIT, FPWAIT, and several 
interrupt signals. The error interrupt shares the same faiterrupt output as the breakpoint 
interrupt, but has its own maak bit The outputs are reset when the correspondmg mask 
bit is driven LO, but thia mask bit must be returned to the HI state for Airther mterrupt 
10 edges to detected. 

Mkardaddregfl And Mock centrol PAL 
Tins PAL perfarma two independent Amctiona: ocmtral of the FP microaddrees 
source, and control of the FP dock. 
15 Hie mkroaddresB source is selected by two bits of input, and can be aa foOows: 

(00) FP Ne«t Address Logic 477; (01) CP microaddress 211A; 10 Start address register 
479 (continuous); (11) the ou^mt of stack 478. Attemative^, another input permita the 
two-bit sdect command to be overridden. In this case the CP microaddresa will be 

enabled whenevw the module is enahled. This input permita the host to get Mcess to the 
20 WCS 470 for startup or debug. 

Tl»e outimta to control the FP dock generator 480 (wfakfa is £CL in the present^ 
preferred embodiment) can be sdeeted as foUowa: (00) FP microcode dock is free 
running (01) FP microcode dock stopped. Another logics conditkn permita the FP dock 
tobecontroOedby a<fifferentinput, so that the dock free runs whenever this is asserted. 

2S Serial / DaraHrilni.d iwAi^ 

This PAL controls the loacfing of microcode from the host or the CP into the FP 
module's WCS 470. Ificrooode loaded by the host must use the serial loop, but the CP 
can load microcode m paraUeL To achieve this, this PAL esaentxalty ^pbmf^tft a 2:1 
muhipleser. There are several pdnts to notec 

1. The pipeline registers used hi WCS 470 do not have separate output 



of OuPopt Pbrigwtomi. Ltd. PagelSa 



30 



1304S09 ^ 



enablea, ao they are separately enabled to prevent contention on the data buaea. 

2. Tlie depeDdeoM of the two eeriai daU ckxk signals is switcbed, 
depending on whether the host or the CP b selected. 

3. Wh« the boat Is controlling the serial loop, then a oux^ 

whether both serial data docks are driven together (for normal data shift), or onty ooa 
of them U driv«m (aa selected by another sl^iaD . The serial data c^oeka are controlled like 
this during the read back of data from the WCS 470, 

This PAL decodes mkrooode fields hi the WCS extenskn 490. to select wfaidi of 
the registers 420 is to be accessed. Ttm selected register is on^ written to when 
BCHEGDIR is LO. Aa weQ as selecting one out of the XFBBG. PPHBG, UABBG or 
MREG to be written to^ two other Ametkos are performed ; 

1. Hie direction and output enable controls to the CD bus transceivm 
444 are generated. 

2. The nuerocode bit to dear a breakpoint is write-enabte-gsted with the 
write-gste ctock. Tbe use of a short pub» here prerento missing breakpo^ 
immediate CIt could pose problems if the CP were stiD hokfing a stoaal 
kw to clear the previous brenkpoint when anew breakpoint 

Omng this signal (active-tew AND) with the etock to keep it short. 

AD the cfecka^teobes are <|ualified by CPMCCK and CPMCCKWQ to set their 
tindngi ^'^^'^^a « ««w»de <grc^ 



TOs PAL decodes the CP mfcwocode fieWs to select which of the regfa^ 
to be acc ess ed. Hie selected register is onbr read mheo, RGBEGDIR is HL 



Control PAL. K» 
Two PALs are used to control outputs from the hoUfing registers 420. 
The first one generates the transfer sequence wa ve fo r ms used to enable docks 
to endi pair of the registers 420. Each transfer cyde lasU from 1 to 4 mnwr (^dea. aa 
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apedfied by the tranafer length. (The -minor cyde" period is generated by the tnuufer 
clock 412, M diseusaed above.) On each niina> tyde a pair of F.words is tranaferrcd. 
although one of them may be inhibited by another PAL. The tranafer sequence 
wacvefonna appear on four Unea aa a NralUng LO/ 

The first One in the oydic aequence to be asserted ia controlled by XFHBST 
< l:2>aiMi oo^y oocura when XFINIT IB HL XFINCT i» <m|y 

and on subeequent cycles the current aequence waveform ia used to generate the next 
UCXroiB diaablea HRCKENP* <0:S> when the tranafer direetton ia from holding 
reglatera 420 to register file 430, unleaa the LOOPBACK mode ia in operation. The 
HBCKALL ovmidea the normal start and length controls, and foveea all dock enablea 
to be active at the same time thua quadruphc^ing the data into all register paira in the 
one cyde. 

Tlie XFTYPE liqHti aefeeto iR^wCher the wavefam 
cydea or a parallel microcode k)ad cyde. In the latter case there are aiwajs 2 udoor 
tranafer cyctea and the timing can be afight^ Cerent Thk input can afl the 

doeka to the holdiog regiatera. 

The aecond PAL generatea the tranafer aeqtaenoe w av e fo rms uaed to output enable 
each regiater pair. These two PALa are ined for oppooite transte <firectiona. 

A 'dock mask" PAL generates the 8 dock enables used to contrd the writk^ mto 
the eight 32 Wt reglatera (P.registera) which make up the hokUng register 420. In a 
single nmlor tranafor <9de up to 8 P.worda can be tranaforred into the 8 separate 
registeni oT the register bank 420. The inputs show the first register which mint be 
updated (0...7), and the number of P.ragisters (L.^) to update. The PAL aoeonfiikgtf 
generates a mask with a fait aet for eveiy register to be updated (within the m^ 
26 tranafer cyde). If the tranafer ifirection is from the holdii« regiatera 420 to the register 
file 430, then all the mask bets are set HI, thua preventing a^y writing to the >***MS"g 
regiater. SlnXar^, if a microcode k»d cyde ia occurring; then the clocka are ^w^^Mf ^ If 
an hokUng reglatera are to be deared (aa indicated by yet another signal), then the 
enablea are aet low ao an the holdfaig registera are updated. 

30 HoMi^Regiater3^flrtAd*WJAL 

> ffaei ai'»ytmii^ ^4 ^ Faae 18Q 
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This PAL impleiDents a 4:1 mwltiplener followed by a register. Tbe 4 posaibltt 
inputs to the mtilUplexCT mi holifing register (HR) start address from a register, HR 
start address from the microcode instruction; HR start address from tbe CP address buM; 
The previous HR staK address. 

If the module b not selected, then the previoua HR start address ta maintained 



Rttfrtff Fib Wl.SMttPa 

This PAL cotttroto the write enables to the register files 430. In a minor cgrda 1 
or 2 P_wor<b can be written into the register file. Six bits of start positioQ and tef^ 
are uaed to ^nerate tbe write enable mask, in the same w^ the dock wmhA |a 

generated. The relevant 2 bits from the mask are sequenced out of a kmer^.write 
signal or an ui^er-half- write signal, depending on which minor cjde is in progren. When 
the kM)pback mode is active, the write enable mask is <fisabled. Another ajpial can be 
used to force both words to be written on every minor cycle. 

The kiwer-half-writo sigasl and iqiper^udf-writo ai 
dfa^tfen is wrong; or if the transfer type b a microeode load fkmctkm. Ttm Input rfgmj. 
are also decoded to select the read^vrite mode of the re^ster file. A busy s«BaI line is 
also proivided, to indkato the hobfing register data bus 422 is in use. 

out PAL registers the register file address when the module k selected; 
otherwise the previous address is hel± llie most significant btt or the address is mo(^^ 
to ta^dement the soft douWe buffering. A two-bit signal sel^ 
wfll be made to the oMst fli(MffcttD* address bit The options are: 

L Use the input fait Tliis is the pbgrsieal addressing mode. 

2. Use BANXSEL TUs is the doubte buffered mode. 

3. Use tbe inverse vahae of BANKSELTUs Is the preview mode. whefel]y 
the CP or FP can access dste oo the other side of the double buffer without having to 
swap banks. 
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R<^wtor Fite Ailrtrei IncfeingnteP 
Tlus PAL (when enabled) increments the Register File pointer. Thus, the addresi 
can be incremented at each minor <7cle (of the transfer doc^, to fetch out the nest pair 
of numbers from the register file 430. or write the next pair in. A control input permits 
keeping the address constant during the Orst minor eyde oT a transfer (tKon holding 
registers 420 to register file 430. lliis is necessary because of the pipelh^ in the data 
path. 

Data VaHd contFul PAL 

This PAL controls the data valid signals to the even and odd sides of the register 
filea 430. In a takwr cycle, either I or 2 F_worda can be written into the register file. 
I>epending of the start address and length, one or two words or <fata^ be valid hi this 
nrinw <9cle. Two outputs CEVENVALID* and ODDVALID*) faidlcate which words are 
valid. This ftmctioo is disabled fer transfers from the register file. 

Microinstnietion Addr«iiiAi^ P^T^ 

This PAL selecto the next microinstmetioa address to be from the true address 
QM ae. the output of register 474) or the febe adcfrese field (the output of register 475). 
An internal 'always true^ status can be selected for unnowfitional jun^ Both can be 
disabled to allow the start address register 479 to drive the microaddress bus 473. or 
wfaenSTACKPOPorBEAraTACR* signals taidieate that a st«^ operation is underwi^. 
(STACKPOP is derived from FP microcode. whSe BEADSTAiCX is T^^fitrtBml bf the CP.) 

As noted above^ the FP module 130 does not hacve a separate sequencer, in the 
same sense that the CP module 110 and OTP module 120 do. In feet, the module does 
not even have a separate p rogra m counts as sacfa;instead. the true and outputs of 
registers 474 and 475 ffl this ftmction. 

TWs PAL remembers when one of the floating point status bits has a 
*stic^ status' ccfKfition. (A "stichy" status is used, in the present^ preferred embodfanent. 
to monitor some feuh c o nditions separatrfy firom the primary aww-hAnrfimg f«>nK«ik»i^ , 



Pw exain{>lei a test for overfkm can be perfonned at the end of a vector operation rather 
than on every element calculation. The multiplier 440 and the ALU 460 each have several 
outputs for ati<^ status bha (to show overOow, underflow, invaUd operaUon, and nimn^^ 
errors). Two mkrocode bits control the updating and the clearing of the sticky status 
register on a per cycle basis. 

A ^milar PAL performs this function for the FMPY status. The logic in these 
PALs also provide encoded outputs to indicate various sticky status conditions. The dock 
timing used permits the presence of a stut^ status bit can be check in one cyde. 

This PAL performa two separate ftmctions: control of the subatMttine stack 
addressing and control of the table address counters. 

Figure 39 shows the preferred embodfanent of the stack register 478 in the 
Ooating-poittt processor module laO. The PAL 3910 controls a multilevel p^>eline register 
3920. On the present^ preferred wnhodimfiit, this is an AMD 29520.) Tb» muMlevd 
registw- 3920 inchides four pipelined registers 392L However, the output multq>iCTer 3922 
can also select any one of these registers for direct outpuL Thm output rf thi« im^ftiplffger 
is connected to the mierMnstruetton address bus 473 of the FP module 130. 

The PAL 3910 provides control inputs to multilevel register 3920 whidi mi>ir*> a 
fimetioa as a LZFO Qast-in-first- c) memory. This permits the memofy to operate as a 
stack. The PAL 3910 provides transfer signals 3912 (whidi are ANDed with the 
mkrocode dock) to the pipelined registers 392L It also provides a select signal 3913 to 
the multiplexer 3922, 

The PAL implements the usual push and pop functkma. In addition, it can also 
be commaoded to enter a read*stack mode, where any stack lev^ can be read without 
(fisturbing the stack status. 

Register FUe Addrgm MadHW PAT, 
This PAL modifies the most aigniflrant bit of the FP register Ole address fields 
as a Amction of address modifier code and the currently selected Hariir of the double 
buffer. There are three address de^ (X, Y and T) to be modified (corresponding to the 
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first opennd bus secood operand bus 432. and results local bus 433), and the logic 
is identical for eadi of them. The logic for one of these address will now be described. 

A modified ntost-ai^dficant address bU is derived from the most signiOcant bit 
of the input address, a two-bit modifier coda, and the banii select signal The 
modifipntions to the address bit are: 

L No modiflca t ion - this is the pt^sical addresring mode. 

2. Inverse or the bank select signal - This is the logical' mode^ used for 
Qoraal aooessea in the double buffered coofiguratioa Note thai the bank selection is 

opposite to that used when data is transforred between the regiattf file and the holding 
registers 

3. Equal to the bank select signal - TUs is the preview mode. As 
discussed above, hi this mode the FP can access data on the other side of the double 
buffer, without having to swap banks. This capalsiity he^ to keep the floating point 
pipeline fUlL 

The 3 modified address bits ar« registmd estonal^ and fed back hi as *old Am- 
bits (ooe for eadi address) . These are used to replace the 'calculated' vahies for these bits 
when a ^lse old Afi'oommaod is asserted. This feature reduces the address setup time 
when the address mode remains unchanged over sevgal cycles. 

Resuha Bua control PAL 
This PAL decodes the resulta-bua source microeode fieH end output rtinhlrs the 
required device (sa. FBfFY 44l(K PALU 450, or scratd^iad memory 1610^ in the 
conllguratkm of Figure This PAL also provides a cfa9 enable sigtt^ 
memocy 1610 when needed 

VME Int^rupfai PAL 
This PAL fanplemwrt s the VME bitem^t protocols in a state madune. When 
GENVMEINT goes active (hiiM IBQEN is driven htfi on the next positive VCK edge. 
IBQEK remafais active until the intemqpi is acknowledgsd* so the cause oTthe intemqA 
(GENVBCEIMT) is renuyved by drhing CLRIRQFP* low. The VIACX*and VIACKIN* 
signals are monitored and mbm an interrupt acknowledge tyde is detected for the 
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PiteBi ABPBertnn nf HiiPnnt Pml aw^«n F^i Pmip»> 

Interrupt being generated an intmal Interrupt acknowledge cycle la started. The correct 
VME interrupt acknowiedge <ycle la identified by these signab going active (VIACKIN* 
ta part of a daisy chain) and VMEIA <01:03> being set to the same level the interrupt 
was generated on. The hitemal interrupt adawwledge cycle waiu for VMEID8 to be 

asserted and then over a nuniber of cydea enables the interrupt vector onto the data bua 
(IVO£*), sets lYDTACK and removes IRQEN. Somethne hto* VMEIDS goes inactive and 
the interrupt vector and IVDTACX are removed When an interrupt acknowledge cycle 
occurs VINTACK is asserted which then starts the BUSCON (via the address decode 
pab) on a aiaeve cycle whkh wiU aUow the intemqyt vector onto the bus. lite intemqit 
acknowledge daisy chain passes throu^ this PAL unhindered when n^ 
are outstanding. 

This PAL is on^ coDcened with data transfers between the VM£\ bus and the 
data FIFO. The direction of the transfer is hidden from the state madihie so the n^*^ 
and status are switched exteraaQy. When a DBCABSTABT goes active the state machine 
starts the DMA transfa*. It first waits for gyndawnised FIFO status (SDMAFSTAT*) to 
indicate there is data or room in tlie FIFO for one transte and SDMADONE to indicate 
the DMA counters are ready. DMACK is driven low to output enahle tiie FIFO in case 
it la providhig data. Tha state machine issues a requsM for the bus CLBUSREQ*) and 
waits for It to be granted (SLBGBANT*). When the bus is granted, DMAAS* and 
DMAD8* are assorted in compfianea with the VME bus setup times. These two si^ials 
are liald until the VMS slave device returns the data transfer acknowledge<SLDTACK*) 
and then DMACK is driven high. One cyde l^er DMAAS* and DMADS^ are removed 
and a positive edge driven on DMACOUNT. If tlie transfar mode (DMABSLOCaO is 
aingte transfers then LBUSBEL Is asserted to release tlie bus and the above sequence 
repeats, the transfer mode is block(sequential) transfers then the bus is not released 
unless tlm ad of the block has been reached (as indicated by BLOCKEND). FIFO b 
Ain/empty (SDMAFSTAT*), DMA count is ffwhtmMM (SDMADONE) or DMA haa been 
aborted by the nrgation of DMABSTABT. Note that during a Uo(^ transfer the 
DBiAAS^is beki active until released by BLOCKEND. 
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The SLBUSERR* input goes active when there has been a bus error aa a result 
<rfa DMA acceaa. If this occurs the current transfer ia aborted and DMABBRR ia driven. 
The state machine remains in this state untU DMATSTABT Is negated which wiU clear 
DMABERR. The final input, DMATEST, allowa the DMA to occur without any VME bus 
5 pydes occurring. This is uaefUl in testing the basic operation of the state machhie and 
also provides a means whereby the FIFOs can be flushed in the event of a bus error. 

A resetcondhion can be forced fay using an unused comUnatioa of DMARSTART* 
DBIARBLOCX and DMABTEST. 

^0 DMA Address 

The address biU (VMEIA <01:07»are monitored to detect when a 256 byte 

boundaiy is about to be reached so that a block DBIA transfer can be interrupted 
to allow VMB arbitration. (This ailows compliance with the maximum block transfer 
length constraint in the VME spedfication.) This ia hidkated on BLOCKEND. The 
remainder of the PAL is concerned with handling the DMA address incrementing. 
Depending of the transfer aise (16 or 32 bcU)the DBCA address is incremented by 1 or 
2 respecthrety whatever DMAIKC goes high. Tba incrementing oi the DMA address is 
controlled by DMARLONGZNC i^i^iich selects whether DMACNTEN^is active for one or 
two cycles of the microcode clock. CXRFF* resets the flip flop that caught the edge of 
20 DMACXDUNT. VMEDLST* is available to reset the PAL, if necessaiy. 

InterruDt Edge Cfttchey 
This PAL catches the positive edges on GIPIEMPTY*, VMEIFEMPTY*, 
DPUEBIPTy, DP2IBMPTY*, VTPINTD and VTPINT, and negathre edges oii 
25 OIPOEMPTy and VMBOFEMPTY*. TTiis allows the interrupt signals to be edge 
triggered and later synchronized to the microcode dock. When an edge is detected the 
correspooding output is driven kvw. The edge catching flip flops are reset in pairs: 
TPINTGIP* resets the two GIP edges, TTIVMBF* reeeta the two VME edges, 
TPINTVMB* resets the two VTP edges and TPINTDPIPB* resets the two DP edges. 

30 GIP Microcode Decode 
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This pal decodes th^ three microcode signals UGIPRD*. UGIPWR* and UGIPFR to 
generate the output enables, FIFO read and write clocks and the register clock. The FIFO read 
clock is gated by the FIFO empty status (GIPOE*^) to prevent the reading of an empty FIFO 
S causing errors within the FIFO. The clock type signals are qualified with CIPCl or 
GIPFRDCK, 

GIP Intemipt Mask 

This PAL performs two functions. First of all it selects 4 out of the 7 possible interrupt 
10 sources and selectively inverts whm necessary so the intemipting action results in a positive 
edge. Two sets of 4 interrupt sources are allowed for and GIPSEU selects between them. The 
second function is to mask the selected set by the 4 mask bits (GIPLM < 0:3 > ) before driving 
the results out as GIPINT ,C:3 > . The GIPLACK* signal is simply inverted to give GIPLACK. 

IS GIP Intcmipt State Maghinc 

This PAL looks for positive edges on the tntemipt inputs (GIPINT < 0:3.) and when one 
or more occurs GIPINT is driven. A specific interrupt is cleared by selecting it with the 
microcode field UGIPCCS <0:1 > and asserting UGIPCLAI. All flip fk^ are cleared on reset 
byGIPRST*. The edges are detected by delaying the interrupts by one cycle and comparing the 

20 delayed and non^delayed versions. The non-delayed versions have already been synchronized 
to the GiPCl dock that this state machine runs oif. 

Hoa Computer 

A system like that shown in Figure 1 can be used in a wide variety of computer 
25 architectures. The presently inferred embodimoit uses the system of Figure 1 as a numeric 
accelerator subsystem. The host computer is a VAX 8800, running a VME operating system, 
and communicating with the system of Rgure 1 over a VME interface and VME bus 4110. 
Howevd** an immense variety of other configurations could be used instead. For example^ there 
are a wide varieyt of UNIX™ machines which could be used, including e.g. units from Sun 
30 Microsystems. 

Moreover, other system bus structures could be used instead. For example, the 
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8ubfl(ystem of Figure 1 eould be used with a VAX running VMS, and linked through an 
inter&ce box. This subsystem can even be used with a personal computer running 113- 
DOS, which cosnmunicatea via Ethmet (for example), with a simple VBIE-bua interlhce 
box. 

5 It abould also be noted that, although the internal architecture of the subsystem 

of Figure 1 a primarity a 32-bit architecture* this subsystem can be used veiy 
advmntageouaty with 64 bit words or 4d4)it words. One &ctor In achieving this capability 
is the use of an internal data path in the floaling-pQait proce ss or module 130 vHxkh 
pennita 84-bit operations to be performed in oo^ two cycles. Another factor in achieving 

10 thi^ ^sin, is the vefy wide cache bus 144. which permits multiple 64-btt words to be 
transmitted hi parallel to the numeric processor module 130. Thus, performing 64-bit 
<''> > ff " faH oos can usuaUy be performed at nearlbr half (tf the word rate (ie. ahnost the 
same fait rate) as a2-faft operattoo. 

Moreow, of course, the numerous inventive ^*>>*^g> set forth herein can be 

15 adapted to a tremendous variety of aystemsL These teaching can be adapted to aystems 
whoee bus standards do not at aQ correspond to those of the present^ preferred 
embodiment In fisct, the VME bus inter&oe b not even eapeaaSfy advantageous (aade 
from having reasonable total bandwidth), and is disclosed simp^ to provide ftifl 
wMnpBanre with patentee's duty of <fisclosure. 

20 ftttJlttffifiSSJsJifi^ 

As noted abowe^ the presently pretered «nn>ww««w,rt uses a VME bus as the 
primary interfile to the host TUs bos is wett known, as <fiscussed above. 

A wide variety of other bus configurations could be used instead. For f'^tur^ 
Versa&is^ F^ it ureBuii or NuBus eould be rea^ designed into the system if desdred. For 
25 veiy higfa-fpeed computmg systems, it nriffiit be advantageotia ta m optinni h>«»>«, iiAig 
mothilafted sofid-state lasers on optical fibera 

Piptuy^ ftwessoy gt^^^ysten^ 

One advantageous q^stem embodiment uses not onfy a host communicating with 
30 one or more suboystems like that shown n& figure 1 (or 9A or 10), but also uses an 
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additional oub^rstem whfch is a apedaBsed grai^ processor. The most pretered picture 
processor here is known as a *GIP* proce ssor , and is ovailahle from benehMark 
Technologies Ltd., Ktngaton-upon-Thames, R«i g t«> n <i 

Figure 41 provides one sample confleuralion, but of course a wide variety of other 
5 topologies and system architectures could be used instead. A host coo^mter 4100 
comnumicatea with a pfcture processor subqratem 41i0. and with at teast two numeric 
accelerator subsystems 4150 (whidi maj be, for exampK like those of Figures 1. ft, 10), 
over a VME bus 4110. The VBffi bus 4110 also permits access to main memo^ 4160, 
mass storage 4170 (e.g. a hard disk), and optkmaq^ also one or more device interfhees 
10 4180 (whkh may be output devfces^ gatewaya, other storage de'^.vices.etc.). 

Two additional busses are used in this The pfcture (bta bw 4130 

providM an ^fdication-eustomised interface to a grapliks processor. (This is a wide bus, 
which is particular^ useAil for imi^ or graphka tranoniaatOiL) In tha sample 
rni h odhnen t , this ia the xaP busT (marketed bj bencfaBiark TWhnnktfes Ltd). TWs 
16 ^iplicatioo-opthnized bus is wellnnatched to the higb4iandwkith I/O demands <a the 
picture processing subsystem 4140. It is a very wide bus, with 160 data lines^ 

The other backplane bus is the daU pipe bta 412a This bus permiU multiple 
numeric ncr t ki a to i' subaystems to be combined in topotogies such as those shown in 
Figures 34, 3ft, 36, or 37. In this sample embodiment, this bus has 32 data Kwy>« 

20 QsmtksLsLA&Amm 

Some knportant ways to use the various points of mnrmtiao, and some w^ to 
use the dtsdoaed system ardateeture^ win now be described. A number of the methods 
d es cr ib ed are bei i s v ed to be separatd|y inno v ative. 
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Rffllization of a Sample Operation 

A small example of the use of the architecture will now be described. In this example, 
the host processor 4100 issues a command to the numeric accelerator subsystem 4150 (like that 
5 of Figure 1 » 9, or 10) to multiply two arrays together (on an element by element basis), and 
deposit the results in a third array. AU three arrays reside in the VM6 memory space (e.g. in 
main memory 4160). Before the command is issued, the subsy^m 4150 is in the idle state, and 
after the command has been executed it returns to the idle state. This is also shown 
diagrammatically in Figure 42. 

10 Two versions of the command scenario are given. The first one details a system where 

the only memory space used is physical memory. (Such an architecture might be used where it 
is desired that the host offload as much of the work as possible onto the accelerator subsystem). 
The second scenario is for a system that has virtual memory, such as is found on a VAX running 
VMS« or on a UNDt™ computer. In the second scenario it will be seen how the dynamic 

15 memory allocation and the paging of data to/from the disks are accommodated in the processing 



Figure 15 shows how the command memory 190 is organized. It also shows some of the 
types of conunands and interrupts exchanged, and how some of those commands and interrupts 
are handled. A key point to note is that the command memory 190 is preferably partitioned in 

20 software, so that it includes two command FIFOs. A command cp. command FIFO 1520 
buffers commands addressed to the CP module 110, and a dtp_ command FIFO 1510 buffers 
commands addressed to the DTP module 120. 

The command interface* interacdm, and scheduling of the work are controlled by 
software* and can be tailored as required. Thus, the following example does not define ways 

25 in which the system must be used. It is provided simply to illustrate ways in which the system 
may be used. 

Physical Memory Model fCP/DTP Interaction^ 

In this example, the host processor issues a command to the accelerator subsystem to 
muldi^y two arrays together (on an element by elmient basis) and deposit the results in a third 
30 array. All three arrays reside in the VME memory space. Before the command is issued the 
accelerator subsystem is in the idle state, and after the 
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^-^rf^ru\ baa been executed it returns to the idle state. lUs is also shown 
diagmmTnatirwHy in Figure 42. 

'Hie following steps occur during the esecutioo of a coaunand: 

(1) no host writes a vector multiply command into the accelerator 
5 subaiystem's command queue (maintained in the V&CB interfhoe memory). apedQfing the 
nuo^ber of elements in the arrag^, the address of the two source arregrsi and the address 
of the results array. After the mrnmand and its parameters are added to the queue, the 
host generates an interrupt in the data transfer p rocessor module 120. The host is now 
free to do other woric 

10 (2) On receiving the interrupt from the host, the data transfer proce ssor 

mndulft 120 cc^yies the and its pcurameters into a software maintained 

cp^oommand FIFO 'm the rrmmwmd memory. An interrupt is generated in the control 
p roce ssor module 110 to notiQr it of the eiristenre of this command. Ite data transfer 
p ro c e as oc module 120 returns to its idle state. 

15 (3) In response to the intemipt, the coatrol p roce ss or module 110 leecves 

its kUe state, and reads the command and its parameters from the cp_command FIFO 
1520 in the command memory 190. The addresses given in the fommand are checked and 
found to fie off-board (ie. not in the data cache memory 140). Tluia, in this *«*t"pK two 
data fetch commands and an "interrupt CP when done" command are written to the 

20 dtpjcommand FIFO 1510 in the command memofy 190. Eadi data fetch command 
the source address of the array, iu length, and its destination address in the 
data cache memory. The data transfer ptoce as or module 120 is then interrupted, and the 
control pr oee as cr module 110 returns to its idle state. 

(4) In respo ns e to the interru|^ the data transfer processor module 120 

26 Isaevea its idle state, and reads the first cnmmand (and ita parameim) fktnn the 
d^jcommand FIFO 1510. The ^ta transfer p roce ss or module 120 checks the address 
where data is to be fetched from, and idwitiflea that it Bes within the VME address 
spaee. The data tranafier p rooe ss w module 120 then sets up the DMA controller 640 in 
the VME interihce 160 to fetch the array and write it in to the data FIFO 670 in the 

SO VMS interfisee 160. (Note that this Is actual^ a hardware FIFO, unlike the command 
queue FIFOs 1510 and 1520, which impi— the first-in-firat-out (Uncticmality in 
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8oftwi«.) Aathta data arrives. U» data tranaferproceMornwlu^ 
fronthedataPIPO 670. and write, itintoth. data cache «e«o,y 140. When the 
transfer is completed the dtp.conunaad FIFO is checked to see what the ne« conuaand 
« (If any). In this case another fetch-data cooimand is found, and is e«cuted in an 
Hientical f«aaoo to the flnrt fetch conmuad. When this is finished the 
read and executed. This command generate, an interrupt in tho.control processor module 

no. dtp.command FIFO 1610 1. now empty, and the data tnmafer processor module 
120 returns to its idle state. 

(5) The int<:.rupt informs the control processor module 110 that the two 
h requested are now sto«d in the data cache mem«y. SfaK. ^ 
address of the result amflr Is off-heard, the control processor module 110 sllocaf. . 

t«iIx»aorarnv in the data cache memoi7 140 to hold the remdts. The CP module 110 
now begins the calculation procesa During the cafcuhtlnn prw 
wm be fetched from cache memo.y 140 into the register flies of the PP nodule 130 
(under control of the CP module 110); the FP module ISO will perfona numeric 
operations, running its own microcode and interfiMang with the CP module 110 at 
«0«hi«>izatlon pomta; and the intermediate data sets win be transferred from the 

register files of the FP module into the cache mea«y X40 (under control of the CP 
nwdule 110). Tlius. when the v«!tor m«ltip«y hss been completed, the resulto win ^ 
hi the amy m cache 140 which was previously aUocated fcy the CP module 110. 

(« Tlie control processor modulo 110 than writes a store-data command 
and an "interrupt host when done- command to the to the dtp command FIFO 1(510 -nie 
store-data command specifies the address of the result ar«v in the data cache 

memo.y. the destinatioa addres. (a. spedfied in the original command), and the arrv 
length. The data transfer processor module mis interrupted. If the cp_c o«..».^ FTFO 
is empty, the control prooeasor module 110 returns to its idle state. 

(7) In response to the interrupt, the data transfer processor module 120 
le«ws its idle state, and reads the fiHrt command (and iu parameters) from the 
dtp.eommand FIFO. The data transfer processor module 120 checks the address where 
data is to be stored, and identiSea that tt lies within the VME address space. The data 
transfer processor module 120 then sets up the DllA controUer 640 in the VME interface 
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lao to tranrfof the wawct number of F.worda frcm the dta PTFQ 670 to tl». vmr n..4, 

memoiy. Th« data transfer proceaaor modute 120 raada tb» data from the data eacha 

memoiy and writea It into ttaa data FIFO 670. Whan tha remilt ant^ hai bma 

tranaferred into the data PlPa tha data traarito proeero module 120 notlflea tha DMA 

controller, and then waita until the DMA coatrofler haa aniahad the tran.^^ 
memoiy. 

(8) Tb» dtp.eommand FIFO ia not empty, ae the neat oommaad ia read 
and executed. Thia ia the Tntetrupt-hoat.when.fialahed comaMnd. In leapoaae to thia 
command, the atatua of the command juat completed ia written to the command <iueue 
in the VMB intert** memoiy. and a hoat inteiTupt ia geneiateA The interrupt notifiea 
the boat that ita vector multip^ command haa ended, and It cm read Ita atatua from tha 
atatua regiater in the VMB interC«e 180. TV data tr««rfer p«Kea«» mod^ 
retuma to the alle atate. Thia completea the c^teratlaa. 

There are aeveral pointa to note regardiv the above deaeriptlan: 

At any time during the above proeeaa, tha boat could write a new 
ccnnmd and ha parameter* into the command (|iieu^ and interrupt the data traaafer 

proceaaor module 12a Tlje DTP module wouJd then ganeiate an faterrupt fwiueat to the 
control proceaaor module 110 to notify It of the new eommand(a). If poaaible, their 

«»cutloa wig preferably be atartedaaoutltoedeboveL'nilaattempU to keep the control 
proceaaor module 110 and data tranafer proceaaor module lao fblly occupied in proeeaaing 
eommanda or traaafertlng data, but care needa to be taken ao tfa^ 
between oonuaanda do not occur. 



The data n d d r eaa awripimi^a provide co naiderahle Owribili^. Each of the 
tetorfeeea ISO, 160. and 170 (and the local data cache memory 140) ia aaa^ned a rai«e 
26 that can be acceaaedthroutfiiLThiaaBowa the data tranafar proceaaor 

nodule 1» to control tha correct interfeee to aatlafy the data requirementa for the 
command, •rtthout requiring aeparate command definitlooa for dififereat data source or 



In the aaample above, the commanda originated from a boat oo the VME 
bua, but th^ could aa eaaQy have come from any of the iater&cea (or have beea stor«d 

aa part rf a command Hat) with veiy little change to the foregoing deacrqjtion. Tto 
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host was efaMen as an eaunple. 

Wlien the total array aiaes required for a commaiKt eiseed the free 
storage in the data cache memofy, the oontret p roc ea eor module 110 wiU attempt to 
process the command within the avaHabte storage space by dividing the iwi«nfTi d into 
a numbCT of smaDar operationa Hoivever. for some ^ypes ot eoounand this win not be 
possible^ and the host win be notified U the cosnmand'a fidhire. 

If the hoet sends commands too qukl4]r, the internal soAware FIFOs ^ 
become ML To prevent this affecting the overatt operation, the following precautions are 
taken. Ftot, the dtp^command FIFO 1610 la at least 3 times the depth of the 
cp.command FIFO 1520. Since one host command wiU rarefy result in more tK^t^ three 
data transfer rommandw , the dtp^command FIFO can never oompletefy fin as a result of 
host commands. 

When the ep.command FIFO reaches the nearfy ftdl mark, a status bit 
in the VME kiterfoce is set. 

Virtual Memory Modn^ 

TbB virtual memory situatkm mtroduces a number of compliGatioos which 
necessitates more work being done in the host. These compficatkas arise because the 
apgikatiM ham access to a virtual address spnee that is very much larger than the 
plQrsioal memocy. The total virtual address space exista only oo disk, and the portkms of 
the address apace wfakfa the active software current^ needs are paged into * n * m iTr y at 
nm-tfane m required. This can cause aevml types of probtema: 

An array» or parts of an array, may beonfy on disk, and not ^^sentin 
pfagmieal memory. Furthermoretpartsofan array needed by the aocrimtorsuboystemmiV 
be swapped out fay to make room fbr other tasks runniDg in the aystem. 

The pfayakal address the array is assigned to is not predictable, since it 
is a ftnaette of aU the pr oce ssin g history since thecompttter was started. 

Each virtue memory access goea through a translation procedure to 
de termin e a physical address in order to access a particular data item. This results in 
bdng non-contiguous in memory or scattered. 

To acvcid these problems, the arragrs need to be locked in pl^ysical memory whSe 
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the data transfer processor module 120 is transferring them to/from the rt cache 
memoty, IdeaUy, the amyn should be made cootiguoua. If the arrc^ car- a 
eonttguous, then the data transfer processor module 120 must perfbrm a dcatter/^oher 
operatioQ as part of the transfer. However, it will need a scatter/gather table to know 
where the data is distributed in pl^sical memory. 

Preferably the appHrat ioo software (runnhig on the host) is gfcven the Job of 
orgsnising the transfer of data to/&om the accelerator subiystem, and handling the 
memory management ftmetions that go with this. (In practice the application software 
would not have to concern itself with most of these issues, as the math Hhra^ 
and a device driver would handle them. The industry standard arr^f processor fibraiy 
routines leenre H up to the user to move data to and fron the arr^ processor (u^ 
library routines). The different la^ of software are described below, but at this pdnt 
no distinctions are among them.) 

In the array multiply example described aborve^ the software 
15 undertakes seven steps. 

1) Tkwksfer array A to aoceterator sufaeystem and store at address AA 
(perfbrmed by accelerator). 

2) Transfer array B to accelerator subqrstem and store at addreaa HB 
(performed accelerator). 

3) Wait for n rp f lcrat or subsystem to finish the transfers (performed by 
host). 

4) Multiptf the arrays at addresses AA and BB t<«ether and store the 
result at CC (performed by aoceterator). 

5) Wait for aoceteator subaystem to finish the multip^ ^N^»*«»~f 
26 (performed by host). 

6) Transfer arrey at address CC into host address space (performed by 

accelerator). 

7) Wait for accelerator subsystem to finish the transfer (performed by 



20 



host) 
30 



important pdnta to note about this sequence are: 
Multiple c ommand s can be sent to the accelerator subsystem. These are 
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queued up and processed. 

The hoot nyecta synchronixation points between the tranafera and the 
multiplication, to ensure that multiplication does not start until aU the data la present 
in the data cache memory. 

The host is free to do other work instead of waiting for the accelerator 
sub^jrstem. However, the host's operating system wiU normaOy require an explicit wait 
operation in order to synchronize with the accelerator subeyatem. 

Note that steps c and e could optiooaUy be omitted, since the 
synchronisation of transfer and cakuiation operations can eaaibr be done within the 

sooeteratocsub^^stem as an option. However, this is Incompatible with fadustry da facto 
standards. 

Memory allocation of the data cache memory is handled at a higher tevd 
than the CP microcode executive. 

Tlie arrays are locked in memory/and the data fragoMsitation issues are 
handled by the interface software between the application and tl» acc^^ 
The frequent ^ynchronizatiQn (or wait) points result in blocks of memory being locked 
for shorter periods of time, which places less strain on a multi user or multirtasking 
environment. 

To execute a command the following steps ooctir: 

(1) Host writes the nommand (transfer or caicuiation) into the accelerator 
subsystem's command queue (maintained In the VMB tnterfaoe memory), specifying the 
eommand type and the comspomfing number of parameters. After the command and its 
parameters are added to the queue, the host generates an interrupt in the data transfer 
processor module 120. The host is now free to do other work. 

C2) On recehdng the Intem^it from the host the data transfer processor 
module 120 augends its current activity (either idling or scane transfer) and A»i>m;r>^ 
the command type. The command can be one of three types: 

If the command is for the control processor module 110 (Le is a 
cakuIatiOQ), the command ana its parameters are copied into cp_command FIFO 1620 
in the command memory 190. An interrupt is generated in the control processor module 
110 to notify it of the command. The data transfer processor module 120 returns to its 
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prevknia activity. 

If the command is for the data transfer processor module 120 (Le. 
is a transfinr request), th«i the command and Us parameters are copied Into a software 
maintained dtp^command FIFO in the command memory. Hie data transfer proce ss or 
5 module 120 returns to its previous activity. 

If the command is a ayncfaroniriitiftn command, no Airther 
commands are taken from the queue untfl aU outstanding commands have been 
completed.To implement this, a 'wait for aU and notify host' command is inserted in the 
d^_eoinmand queue. 

10 (3) While in the idle state the data transfer pr ocess or module 120 is 

continually dieddng the dtp.command FIFO. When this queue it becomes 'not empty,* 
the command is fetched from it and the operation carried out. In the case of a transfer 
from host memory into the data cache memory, for the data transfer proce ss or 

module 120 sets up the DMA controller tn the VME inter&ce to fetch the tmj and 

15 write it in to the data FIFO. The data tramfer piu-es s u r module 120 reads the data 
from the data FIFO and writes it into the data cache memory. When the transfer has 
finished, the DTP module 120 removes the command from the dtp.command queue. If 
another command is in the FIFO, it is executed; if the dtp^command queue is empty the 
data transfer processor module 120 returns to the idle state. 

20 (4) In response to the interrupt, the control processor module 110 teovea 

ita idle state and reads the mmmand and its parameters from the software cp^eommand 
in thft rftmrnnml mrmwf Thn vrrtar multlptir nf rhn nmyn nt iwldrc aBOi AA •"'f 
B8 is completed and the resulting array is left at address CC in the data cache memory. 
When the coomnand hss been executed it is removed from the cp^command FIFO 1620. 

25 If no rther cmnmanrt exists the control processor module 110 returns to its kfle state. 
There are several points to note from the above description: 

There is much less internal control and synchronization between the 
oooftrol processor module 110 and dsta transfer processor module 120 than in the pl^yaical 
m em or y model. The data transfer pr oc ess or module 120 performs more of a control (or 

30 command routing) ftmctkm than the coptrol processor module 110. 

lliere are three queues active, one for the host cooununication, one for - 
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the DTP'a work, and one fbr the CFt work. 

If any rwhnitotinn requires more storage than la available on the 
accelerator subscyatem in the data cache memory, then H la the hoat'a responaiUUty to 
split the caiculatiott up into »^*iW parte. 

5 Cy and FP IntorartLin 

The control proceasor module 110 and Ooatine-point proceaaor module 130 
interact very doee|y in order to implemait an al0Qrithm. The control proceaaor module 
110 cafeulatea addreaaea and handles the data tranafer between the data cache memory 
and the floathsg-pohxt proceaaor module 130^ while the Coating-point proceaaor module 
10 130 doea the data cnk r uh Ho na. Thta interaetiott ia independent of the type of tnterfoee 
between the control proceaaor module 110, data transfer processor module 120 and host 
computer. 

In the vector muh^cooimaM the ftoatingiwintprooeMr module 130 calculates 
the vector multiplies, eight efctniwits at a time. Tlnii^ for a large arraq^, there could be 
IS aeverat thousand interactixms (called aynchronixatmpai^ 

module 110 and floating-point proceasor module 130. The aynduonizatioD pointa, in this 
^'"P * ^ ! occur about every 400 na and it is therefore very bnportant to make them 
eCHciapt* 

In moat caaes the control proce aao r module 110 ia able to do the address 
20 caieulatkms and data tranalte more quick^ than the floatingpoint pn 

cando thedataoafcuiatkaaa. If the reverse is true* then the watting role is also reversed. 

As <fiscus8ed above; two flag (CTWAIT and FPWAmcttrtH^ 
between both pr o c ees ora . The FFWATr flag ia cleared by the control yocea aor module 
110 wbm it baa transferred the nest aet oCdata to or from the floating-point processor 
25 module 130. Qy testkig tUs flag the floatittg-point processor module 130 can teO whether 
it can proceed through the ay n ufar o i i l s ation point or needs to watt for the control 
processor module llO.The CPWATT flag is deared by the floating-pcrint proceaaor module 
130 when it has finiahed the data cafeulatkns and is monitored by the control processes 
module 110. The hardware la arranged ao that when a flag haa been demd to allow a 
30 proceaaor through the eynchroniaatkin point, it ia automaticaQy aet once the 



P^178 



1304S0a 



of DuPinnt Ptaei Sfirtama. lAL pg^a 17^ 

aynchr ofri i Ation point has been passed. 

Figure 22 is a state diagram which shows how the FPWAIT, CPWATT. FPDONB, 
and CPDONE flags are used to regulate the data inter&ce between the CP module 110 
and the FP module 130. 

There have been many different tmplementatiana of Kii»%H«ty |^^„j ^ 
semaphoring between procefisora. However, the state diagram shown in Figure 22 is very 
advantageous, and is believed to be novel 

TbB data transfers between the control processor module 110 and floating-point 
prooesaor module 130 are double buffered, so that while the fknting-poiot proM!ssu 
module 130 is working on one set of data the cootroi prtxsessor module 110 can be 
working on the other. The double buffering j aflronqrfishM in software, as described 
above. Both processors have signals to control the swi^ping of the buffer, and these are 
"ANDed" together so the swap onfy occurs when both are active. 

The vector multiply win take place in the foQowii^ steps: (steps with the — ^ 
number occur m parallel). These steps are also scfaematkaQy represented in the flow 
chart of Figure 33. 

(1) Hie control processor module 110 sets the FPWATT flaft and starts 
the floating-point processor module 130 running the vector multiply microcode. The 
floating-point pr ocessu s module 130 watts for thB FFWATT flag to be cleared* 

(2> The control processor module 110 transfers the first 8 elements from 
both aiTST* into the double buffer (which, pfaiysicalty. is provided by the two banks of the 
register files 430, as described above), the CP OMduto then swi4^ 
to give the floatingpoint p roc eas ot module 130 access to the data, and dears the FPWAXT 
flag. 

(3) The control processor module 110 transfers the next 8 elementa tnm 
both arrays into the double buffer and clears the FFWATT flag. It then waits for the 
CPWATT flag to be cleared (by the floatii^point processor module 130). 

(3b) The floatotrpoint processor module 130, on detecting the FFWATT 
flag being cleared, starts eafeulating the vector mnlt^ for the 8 pairs of elements stored 
on floating-point processor naodule 130 side of the double buffer. The 8 results are 
written back into the double buffer and the CPWATT Hag ia ekuwMi Tti tuim «^»mpu 
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control processor module 110 baa already finished and cleared the FPWAIT flag, so the 
floating-point processor module ISO can change the buffers over and start the next set 
of calculations immediata^. 

(4a) Tlie control prooetaor module 110 tranatea the 8 residts from the 
double buffer mto the data cache memoiy and then transfers the next 8 etements ^m 
both arrsQfs into the double buffer and dears the PPWAIT flag. It then waits for the 
CPWATT flag to be cleared (by the OoatingiMihit prooeaaor module 130). 

(4b) The floating-point processor module 130, on deteciingtbe FPWAIT 
flag being cleared, starts calculating the vector muUip^ for the 8 pairs of etements stored 
on ito aide of the double buffer. The 8 results are writtot back into the double buffer, 
and the CPWATT flag ta cleared. In this example, the control prooeaaor module 110 has 
alrea^jf finished and cleared the FPWAIT flaft so the floating^ 
can swap the buffers over and start the next set of oaleulatkma. 

(5) Steps (4a) and (4b) are repeated untfi the oom(dete vector multiptf 
has been completed. 

(0) At the end of step (5) the final set of results are still stored on the 
FP's aide of the double buffer, so the control processor module 110 swaps the buffers 
over and transfers the last results into the data cache memory. 

In the operatioa of systems Gke that of Figure 1, preferab^ the overall run time 
software enviiocuueui is separated into several yery dirtinet levels. Some of the levels 

exist because th^ are (ffistinct modules of code that run on separate processors, and other 
levels exist to divkls the diArentleveb of Interfhdng required.^ :e 
ititer4evel inter&dng are under software control, and can be changed if they do not fit 
hito the application's requirements. 

lUs software organization is general^ q^aite conventionaL However, it is explicit^ 
set out here to provide a dear pkture of the preferred use of the described innovatk^ 

Figure 44A, 44B, and 44C show the (vogramming environment of a aystem like 
that shown in Figure L Note that many of the ftmctionat blocka shown have reference 
numerals corwaq wn rti n g to those of hardware Amenta in other figures, but Figures 44A> 
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44B, and 44C are intended to show these btocka in the relation they might appear to a 
programmer. Thercfbre, it should be noted that these figures do not neceaaari^ 
correspond exacts to the actual electrical and logical connectkma, 

5 Application And Library Software 

The following description aaaumea that the apptkattoo software wiU be writt^ 
in a higb level language* typkally FOBTIRAN or and wiU call standard Wbnry 
routines to use the ac c e lera tor subag^atem. Tlie caOa conform to the de ftttto industfy 
standard are geoeraQy mmpatihle with the instruction set of products from Floating 

10 Point Systems). They include routines to transfer data between the applications data area 
and the accelera tor 8uba7atem*s data cache memor y, a wide variety of ^nd 
some eynchronnatloo routines. 

The software at this lev^ runs on the host computer ag^stem and implementa the 
de sir e d application. It ia linked to the libraries to gain access to ^xeleiator subsTStem. 

16 The Ubraries are the interface to the ac celerat or subqmtem agpstem that the 

application so ftwar e sees. Th» libraries consist of several hundred 
ariftfametic/Ugorithnuc fUnctionSi as weU as routines to Initialise the accelerator subsystem 
system and inhiate data transfers of the applieatioa's arrsys or data sets. Most fibraiy 
routines wiU do Ettle more that pass the input parameters and a funetioa number onto 

20 A dwrie^ Arfggr, hut mmm ttm paMM»toi> *^lM fmrHtttntH If dflBTPd 

In the yeeently p re lterr e d embodiment the inter&ce to the device driver is via system 
calls. Howenfer, in soine operating systems ^FStem calls carry heavy overheads be^^ 
calBng task is sulmitted for rescfaedufing. 

The device driver can be considered as part of the operating system, and runs at 
a more privileged level than the appiioation software. Its main responsibilities sre: 

1) Transferring the mmmands and parameters from the library routines 
into the mnnnaiwt queue maintained in the aooelmtor subsystem's VMB Interihoe 
30 m em ory . This entaUa some queue management and handling of the situation wlien the 
queue is fufi. 

Patent Annftrfrrtnn nf T>iPopt Pitt! SvitoniL Ltd. Paael«l 
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2) Making sure that angr data to be transferred (in virtual memory 
systems) is locked in memory. This requirea thai tlie tramfers have been split into 
contiguous blocks and multiple small tranafert have actually taken place, or scatter/gather 
taWea have been built and given to the accelerator subsystem. 
5 3) Loading microcode into the multif^ p r o ce aa ora, and generally bringing 

the hardware and microcode up to a known state (etther aftmr power*oa or in 
prepara tio n for a new appUeation to use it). 

One of the moat dilEknite aspects or porting the lihrarfea and device driv^ 
a new host is the device driver. These tend to be very operating-system specific, and 
10 require an intimate knowledge of the host syvtem. Optiooaqy, to avoid such problems, 
the libraries can interface to the hardware directly if pt^ywal memory ftrroumja are 
allowed. This bypasses the need for a driver. This method of i "g the accelerator 
subsyatem win be much taster than using the device driver. However, it win also be less 
secure^ espedal^r in a multi-user en vir ooment. 

The microcode executive handles the reaidm of taska, otb^ than transfer and 
cal c ul at kiP- biibBaeee^m^ mAg^n±jmt.rt»r,u^ t^mkm mwm ^^^^m^^i^ti^,. ♦K^t^ 

distribution of work between the control p rocess or module 110 and data transfer 
pr oota mar mediile 1^ and itit«Ml tmA iwrtowinl »yw^h*rtmi^^i^ 

20 TIm executive is positioned on the other end of the queue from the device 

and talies work off the quetae. (This entails some queue mam^peoient to ensure that work 

is not taken from an eaeapty queue.) 

The level of complexity will decide on wiiicfa p roce ss or(s) are used» and win 

depend fatfge^ on how much of the work the ho^ wants to^ or can, ofiDottd onto the 
25 srr^W ralfw miheyaiem. The d ^acjipikitt th^ v*gtor multip^ wMnwiMitMi with, ptyffiml 

and virtual memory models demonstrated the different app ro ach ea the executive could 

take. 

In a physical memory architecture, the executive is ^lit between the data transfer 
pr oces so r module 120 and the control processor module 110. The data transfer processor 
30 module 120 part does little more than command routing because the host and control 
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proceaaor module 110 cannot mrrhange infomatloa directly. The control processor module 
110 organixes the distribution of work and the hnndBhnking. 

Note that thia split is somewhat arbitrary. In an alternative (and lesa preferred) 
architecture, the control p r oce aaor module 110 could act as a slave jvoeeasor to the data 
6 transfer p r oce sso r module 120. rather than the other way around. 

In the virtual memory model the data transfer proc esso r module 120 was the 
master and the control processor module 110 acted as the slave. Most of the ooatrol 
aspects are handled in the host so the OTP's part of the executive obfy ccoc erus itself 
with rtwnmnnd routing. The control proeesaor module 110 contributes a smaO amount of 
10 queue i 



MWft«>A> TVi»«to> ^rtftlfff qyif) 
Tbeae routmea concern themselves with the transfer of data between one of the 
estemal inter&ces and the data cache memory. Tbe inter&ce will primaritf be to the 
15 VBfB bus (and hence to tbe hoet memoty). 

Most of the transfers between the host mem oiy and tbe data cadie memoiy win 
fittntoavetynamrw rangeof different ^ype8» such as: contiguous block transfer; transfer 
with scatter/gsther ooQeetioa; evety nth word; row/eohmm 2-0 array aeeesaes. 

Any types of transfers that do not fiedl into one of these categories can easily be 
20 added as required. An important point to note here is that a vector add wiU use the same 
transfer routines as a vector ontltipty. This is useAit as discussed above* since the data 
transfer routines (as oppos e d to the cnlnilatimi routines) do not have to <«Jiti*tyitA 
be tw e en a vector add and a vector multip^. 

25 MfcTPwte TrwMfrr ^^rt*!^ fffP) 

These routinee conoetn themselves with the transfer of data between one of the 
data cache y a^q^^ y and tbe fisMt repster files of the IIoattng*point processor gwdul' ft 130. 

Agdiw most of the transfers b etw e en the data cache memoiy and the register files 
will fit into a small range of different types, such aa: one vector in. one vector out; two 
30 vectors in, one vector out; one vector hu a aca]ar(s) out. These transfer types can be 
ftirther Hnssiflwi according to their data type. Tlie vectors could be ample or complex 
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datA types* and there are a number of more Bp»ida1i»ed transfer types (such aa- FFT, 
qonvolution, etc.) that are more e£Bdent if the genfr. li routines are not used 

The important thing to note here is that the same transfer routines can Iw uaed 



For each m l n i totio n type, thm is a routhie to pertem the eight (for example) 
adds, subtracts or whatever ia nece ssaiy . The data transfers governed by such routines 
would be oDiy those within the dooebr coupled data path which indudes the fiut register 
fiko 430, the nmhipKer 460, the adder 440, and the scratcfapttd memory lOiaCHkis data 
path also includes several local busses^ iifv*«*«'*g the first operand local bus 431» the 
seoofid operand local bus 432, the results local bus 433» and the loQpbacfc comectaoa 4^ 

Again, many of the req uir ed routines fiin into a small number of standard data- 
fonnai categories. One example of such a categocy is diadaetie vector operattes (two 
vectors in, one vector out; Q^g^ vector add or vector nnihlp^). Thus^ standard 
can be set up for each cal cu lation ^ype within a category. This allows the rapid 
production of FP microcode to implrmmt many of the basic vector operations. 

As discussed above» a registered operation ^wcifier may be used to supplement 
the microoode operation commands. TUs permits all the separate routines in a Gregory 
of calnihttion types to be tomaUy written as a single routine. In this case the control 
proce ss or module 110 must load the operatioo register to spedQf the calculation type. 

Note that the aystem deocribed above has the capability to use co m pac t ed 
microoode, wheiein an o p rration specifier held In a register can be combined with the 
remainder of the mi crocode Instruction. This is actuaQy used in the FP module 130, in 
the preaoitly pr e fer red embodiment, as described above. 

Such a <*otnpar.twl microoode is particularly advantageottt in a numeric processing 
portion of a mcultiprocessor subsystem. In this case, the use of operation-specifier* 
oonqMKted mi croco d e helps to reduce the need for overlaying operations. 



for different operations: a vector add wfll use the 
muhip^, for example. 



transfer routines as a vector 
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Thitti, for mpwnpla, for agimStictm wfaicfa mapped two arrayB onto a ttkird array 
Ci a Ai -H BO. tha instnicttoa regiater could be loaded with an operation spedller 
(e.g. "ADDO before a sequence of such operatlonfl waa begun. Tlie secfuence of operations 
would then be stated in code which did not specify the operation direct^. 
5 Tliua» thia capability for real-time frpanaion of microoode makes the hiterfaoe 

between two microeoded proceasorSi in a muhlproeessor atystem, much more flexible. 

TUa abo i^eatfy snnplifirs the bandwidth requirementa of loading imtructtons 
into the numeric proceaaing portion. Tinia, algorithm Bwitchii« and re-partitioning of 
taska general^ become more fffirient. 

10 M^lthWY BraaslBBg 

Aa di imi saed abowre regarding Figures SA and 3B» the p r eaen t iiwentioo provides 
Wgniftramt new fftpah ffl tiea for muhiwaj branchmg in m ic r oeo ded aystema. ^gure 30 
schematically shows the microcode operation used in tha ^oMmmn^ i ^w m r ^mw^A ^iww4Www>«^» 
to provide multiway branching without address boundaiy omistndnta. 
15 As described above, the preaent inventioa provides an ardiitecture for micrmxlsd 

c omput er syBtema with no address constrainta on muttiway branching. Moreover, the 
increment between alternative defrtinatinns a variable. A sequencer with relative 
addressing cwpahiHty la used. 

The presently p re ferr ed embodiment uses the program counter as an iiqmt to the 
20 jump dwitlnatinn . TUa la <Bffereiit from many previous impleinentatkaa of multiwi^ 
branching where the baaa destinatinn addresa is supplied from a different source. 

Figure 31 diagraimnatioaHy shows some key features of an innovative 
25 impkrment ntion of a terete intepat transform. In this wimple, the transform beii« 
inipienifnit ad Is a fost Fourier transform (FFT^. 

The example shown is a 16 point radis-2 comples FFT. Of course, real-world FFT 
implomentarions will use many more data potnta» but thia example clearly abovra some 
important pointa. An n-point FFT norma^y requires logxn stages, so that a 1024-point 
30 FFT would require 10 stages. Each stage requires n/2 butterQy calculations to be 
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The butterffy calculation ia gtven by: 

rO = r4 + ((r6 • r8) + (r7 • r9)l 
rl - p6 + {(r7 • rS) - (rd • r9)} 
5 r2 - r4 - [(re • r8) + (r7 • r«)] 

r3 « r6 . {(r7 • r8> - (r6 ♦ t9)>. 

wberoi 

rO and rl are the real and imaginaiy parts of result C 
r2 and r3 are the real and mmginary parta of result D 
10 r4 and r6 are the real and imaginaiy parta of input A 
r6 and rl are the real and imaginaiy parta of input B 
r8 and r9 are the real and imaginaiy parta of ooefiOcknt k. 

(Note that the subexpresaiona endoeed in square tmckets [] are formaQy identical and 
the sube apr e aai oos enclosed in cur^ bracketa {} are also foraaUly idoxticaL) 

15 F^ure 31 r^vea^ita a four*9tage FFT operation dii^ammatical^, with each 

cirele representing one butterijy calculation. The fines coonecUng to the left of each circZe 
show where the complex input samples (A and B) to the butterQy '^»^i*«*t»n come from, 
and the lines connecting to the right indicate where the w««pw results (C and D) are 
written to. Tte numbOTs within the circles are the '*«wtp) *qf phase '"ftj^^T Hf^ntt V. 

20 In the presenter p re f erred embocfiment of this method, the FFT algorithm ia 

implement ed by being partitioned, in an architecture like that shown in ^gure 1, 
b e tw ee n the control pro cesa or module 110 and floating^paint processor module 130. Aa 
Figure 31 shows, the address rakulations are not inaignificant, espedaQjr where a large 
number of data points is needed The control processor module 110 performs the addreoa 

25 rakii l atioo a , to provide the ccmct stream of data samples and phase co^Qc^ts for the 
butterfly wiknihtiona The butterfly calculations are actually performed by the floating- 
point p ro cess or module 130. 

The shaded bars mariced cm some of the data points, at each stage, show the 
innovative data handling provided by this embodiment. The shaded bars shown at eadi 

30 stage show one intermediate set of data transfers. Thus, for example, at the very 
beginning of the process, 8 complex words (C.words) of input data are loaded in. Ihis 
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amount of data provides sufficient input to perform four butterQ^r i^^^iiifltWw 
(Coefficients must also be provided.) The shaded bars indieate that (for ezampke) for the 
first set of four butterfEea, C^words XO, XI, X2, X3. X< X8. Xd. XIO. and Xll would be 
loaded in. Four butterfly calculatioDa are performed, and eight C.words of result are then 
5 transferred out. In addition, loading the correct set of phase eoeffldentsm^y require aome 
additiooal transfers. {Oviy one phase coeffidmt is used at the flrst stage, but note that 
the numbCT of different phase coefficients k doubles at each stage.) Thus, at least four 
AiQ cycles of bus 144 wiU be required for each set of four butterflies: two ftiU cycOes to 
bring in eight C.words of input, and two ftitl cycfes to remove eight C^worda of results. 
10 dn additioo, a fifth m^jor cyde may be necessary to transfer in the coeffieienta) 

Bight C^words is equal to 512 bita, or sixteen F.words, so it may be seen that 
this is a quite wignfftrant block of data. However, this method has proven to be an 
advantageous way to make use of the high-bandwidth interfooe provided fay the present^ 
preforred fitnhodimeuL 

15 MaieoMei , tranatoixlngdatafa blopka of this rise turns out to work veyy wefl witk 

the CP/FP handshaking logic used at ayncfartwrfwition points. 
Tluis, the FFT software is partitioned into two parts: 

The control processw module llO runs software running whidi ralnilatfn 
the addresa of the comples dat^ and the phnsci coefficicfkt position in a tnhlo sequenoe,as 

the process nmniiig in the oootrd p roce ss or module also controta the transfer of the data 
sad coefflcienU faito the fioating-poittt p roc esa or module ISa When the OoatingiMint 
proce sa or module 130 has cnmpifitwd the butterf^ cahnitatinnft (and seta flags to indicate 
that it is at a sy i rhroni i ation point)* the control proc ess or module 110 reads the results 

25 and sacves them. Kote that the control pr o ces sor module 110 has no knowledge of the 
butterftf cakulBtiaa; it mere^ kitncfaanges data with the floating-point proce ss or module 
180 at synchronisation points. 

The FP module 130 runs software which fulnitatwi the butterQy by a 
simple fineor sequence of instructions that fmplrmmta the equations as defined aboveu 

30 This routioe does not need to take account of the cntnplifffttfwt address ralnilations needed 
to provide the correct input data and coefficients. This routine can therefore be written 
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in total iwnlrtioQ firam the software nmntng on the CP module 110. 

An advantageous feature of this partfttSon is that the PP procedure at each stage 
can be exactly the same, until the last two stages. (The butterQy calculations performed 
during the last two stages use C.word inputs wfakh fhU more closed together* so that 
5 some interme<fiate results can be carried forward inside the FP» as data held in register 
430.) 

This example also demonstrates the capabOity for processor independence. The 
procedures executed by the FP module 130 are so aimpfy defined that, if the floating- 
point proce ss or module 130 were redesigned around a different M>iwi|ii^Vvn unit chip set, 
10 then otdy this simple butterfly routine wo\iid require f^n^Q Thb m^y well be confined 
to a re-assemhbr operation. 

Hie execu t ion of the CP and FP software occurs in parallel, and is rtHlniMi so 
that the speed at which an algorithm runs is determined by the slowest port. 

Note that the FP instruction sequence, in implementii^ this buiterQj calculation, 
16 remains the same for aU except the last two butteHBes in the FFT. Thus, for example^ 
in a 1024* pdnt complex FFT, the FP module would execute the same instruction 
sequence 512, to do the minitatifms for the first eight stages. Oafy then would the FFT 
begin running a different instruction sequence, for the last two stages. 

20 FFT with Muhmte FP Modules 

Altemativeljr, a partaeular^ attractive configuration ia a s^ystem, like that shown 
In Figure 10, which contains four FP modules 130, 

There are two key fiKstora which affect p erfor m ance: thebutterf^caiculatkm time* 
and the data transfer bandwidth to the Inxtter^y cakulator" (e.g. the FP module 130). 
25 The achieved performance is determined by whichever of these parameters is not met. 
Use following aaa^ calculation relates to a IK complex FFT, radix 2. 

Use faaak radix 2 FFT butterQy equations conaat of ten operatioos (4 multipltes 
30 and 6 add/subtracts) when partial results can be reused. With a system Gke that of 
F%iire 1 (or Figure 10) this calculation takes 10 cycles^ because the equations don't lend 
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themaehrefl to using the ALU and Mtiltiplier in paralleL Uamg a 42 ns cycle time, the 
buttfirQy fiftiniliition will take 420 da. The true cycle tunee of 28 na for the 6 ALU 
operatloQS and 42 ns for the four multipfies (330 oa in total) have been derated to 400 
ns for this esUmate, to cover overheads such as flynehronlsatk)n, pipeline startup^ 
etcTfaus* one FP module 130 can ^iwii^t^ a butterQy in 400 ns. 



Each radix 2 butterify calnilation requires 2 complex samples, and a «»"p^T 
c o ffflfcifint (or twiddle &ctor). It produces 2 conqtlsx results. In total 5 eomptex numbers 
or ten floating pofait words need to be trsnsferred per butterf^ between the data 
moDOfy 140 and the FPU. The cadie m emoty bandwidth is 320 BCbytes per second, or 
80M floating point words per secood. lUs data rate Is on^ achieved when 8 cooseeuthre 
wOTds can be transferred in one menMity cyde (100 na). Howerver, ^fhea executing an 
FPr this can always be doQe.Tbe most efficient way to use the memory bandwidth Is to 
transfer data Cbr 4 butteritteo per memofy cycle. Tlius four buttvf^ calculatkms require 
6 memory transfer eydea. 

A IK complex FPT (radix 2) rnntains 6120 butterffies. The iM««witi« time 
permitted by the data transfer rate for this FFT is therefore given by: 
(6120 / 4) * 6 * 100 ns « 640 microseoonds. 

However, this throughput estimate must be modified, by considering the effect 
of the last two stages. Each data set (8 C.words) of resuhs from a set of four butt«i^ 
rajnitstiniiw at stags n*2 is s uflir i sui to calculate 4 butterl&ea for stage n and 4 butterflies 
for stage n-h 1, without returning the intermecfiate results back to memory. An itdifitinnal 
set of c o efBcien ta wiH howevert be needed for the seomd stage. The net result of this is 
that 8 butterflies can be calculated with ooHj 6 memory cycles. (This technique is further 
described at pages 677 and 600 L. Babiner and B. Gold, Theory and sn nB r atin n of 



A IK rompkrx FPT (radix 2) pontnhw 6120 butterflies so the minimum Ume as 
governed by the data transfer rate with this two stage FFT algorithm is 
(6120 / 8) * 6 * 100 ns«384microeeoonds. 

Thb time is less than the estimated transfer time of 400 microseconds. Therefore, the 
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acvaflable memory bandwidth is well matched to a set of four FP modules working 
together to achieve an FFT in 400 microseconds. 

There are several techniques that can be used to reduce the bandwidth 
reqiiirements fVirther 

5 1. The number of different coefficients used within a stage varies. For 

example stage 1 uses 1 eo^Bciettt value for all butterflies, stage 2 uses 2 eoeiBdenta» 
stage 3 uses four coefficients, etc, and stage 10 uses 512 coefficients. 

For the earlier stages, there are big savings to be made in the memory bandwidth 
by in^iiii«^«g the coefficients at the beginning of the stage and not on evoy butterf^. 

10 2. If four FFTb are performed in parallel (so thai each of four FP modules 

130, in a single accelerator subsystem, is used to ndnilatft a separate FFT, rathw than 
one <iuarter of one FFT, then the coefficients can be broadcast to all four FPs. This 
reduces the mmnory bandwidth used by the coefficient part of the transfera. 

3. The two stage butterfly calnilntion can be extended to three or four 

15 stages, the limiting &ctor being the sise of the FP's register files to hold the new data, 
the current data, and any intermediate storage. For ^lample, a four stage algnithm 
requires 16 «»t«pi«« and 8 coefB rie nt a, and produces 16 results after 32 butterfly 
caleulatioQS. This gives a ratio of 10 memny cycles per 32 butterflies, which will allow 
the cache memovy bandwidth to msppart an FFT rakutatimi every 160 microseconds. 

20 All these ideas can be used with a radix 4 or radix 8 FFT if desired. In f^ the 

very wide cache bus architecture provided may be particularly advantageous with higb^ 
radix algorithoosL 

It should also be noted that other integral transforms can be sinularty partitioned 
into stages of multiple bufcterfty '^H*l»^fa*«», although the butterQ^. definitions and the 
25 retatkas of the stages may be di ff erent. Thus, the foregoing tenirhings regarding data 
manipulatimi can be applied to other discrete integral transforms as weU. 

BH^yr"^ l^lynrithm TmfJ#.firi#ntfltimi 

Figure 32 ahows a method of running a histogram algorithm, in hardware like 
30 that tiKvwn in Figure 16. 

As Figure 16 shows, the data path, in the calculation portion of the numeric 

P5S /^aBiTTI»<"" rfDuPoP^Kxri awtemsLLtd. Fml90 
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proc eaa or nibiystam ISOi, praferab^ includes not onfy a multiplier 440 and an adder 460, 
but also a ac retcfap ad miemory 1610 which is very dos^ coupled to this portion of the 
data path. (Jhaa memory indudee addrese logic 1611.) 

TUB scratchpad memoiy 1611 gtvea the module ISO the ability to calculate an 
5 address and fetch the data locaQy. Without the memofy 1610, the FP module 130 would 
have to give the address to the CP module 110, which in turn would do the lookup 
function and return the result back to the FP module ISO. Note that this would require 
Bigniflrant additional handshakin g, which would be very ineffideni. Thua> the architecture 
of tlua smafl^acale data-path portion also OM^erates advant^eous^ with the Isrge-scale 
10 data-hsndHng architecture used for infterihoe to the numeric processing module, as 
deacfibed above. 

In the present^ preferred embodiment, the scratchpad memory 1610 can be used 
in three ways; it can be used as a table memory, for atgorithnM such as of 
transcendental fUnctkms; it can be used as a local stack; or it can be used in histogram 
16 algorithms, to collect resulta. 

llie aNIity to use this scratchpad memory as a stack Is particubttty advantageous^ 
since this permita tlie data interfoee, at the edge of the calcuhitton portkm oT the 
suboystem, to be defined in a wsy which is very advant^peous for the ovenUt ardutecture, 
without requiring that the register filea at that inteiCbce be capable of utiH—*^ as a 
20 stack. 

Compilation of routines from common high^evel langusges (such as FOItrRAN) 
into nikroeode is important way of generating microcode propama. of vector 

opera tk i ns faito efficient microcode is relative^ easy. However, there will near^ ahfsys 
be a wi g i i Hn«wt fractkm of scalar ope rat ions as well, and compHatioa of these is a 
25 significaf)^ trickier. 

It has been d to cof ver ed that compilation of scaler routines into microcode pro ceeds 
particular^ weQ if a stack-based architecture can be used as the virtual machine. (The 
coaventiooal procedure for doing this uses trandation Into reverse Polish k)glc) 

The use of this scratdipad memoiy to accumulate results is particular^ 
30 advantageous with histogram algorithms. When histogram algoritfama are run, the 
histogram data can be accumulated in the table memoiy. This avoids adding access load 
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to the data eacbe bus. 

Tbib use of a elosetsr coupled local memory to collect histogram data la particularly 
advantageous hi image prorefwing algorithms. Blany known bnage processing algorithms 
use histogram computations, but the massive vohmes of data which must be handled 
5 means that cadie bandwidth is at a premium. This innovative teaching helps make the 
use of histogram algorithms more useAiL 

Figure 32 shows a single wmmple of a tabfy ^rpkal histogram procedure which 
is applkahte to many image processing problems. Note that the histogram table is 
aooessedat every iteration of the Inner loop of this procedure. Hierefiire. providing a very 
10 doaefy coupled storage for the histogram table wiU tremendous^ reduce the bandwidth 
re<|uirements Sot a procedure of this 

Pipelined AtoorHhtyi yftt^ Preview 

A jH g ni iV ^ P ^ t^4i<>h f^ ^ contained her 'yi n is a mftthod of running a pipelinfid 
15 algorithm, using a software-controUed double buffer with a preview mode to maintai& 
average throui^^ut throu^^ ^ynfrhnrnirfitiiiTn pointtSi 

Figure 33 shows a method of running a pipeB n ed algorithm, in hardware which 
incfairtes a software-controlled double buffer like that shown in Figure 20. 

As noted above, the use of a so ft w ar e<ontrotted double bufl^ is very useAil in 
20 crossing a dock boundary be t ween liigb^peed calculation units and a higfaer level of 
coitroL However, it shoukl be noted that the advantages of a softwarMontroOed double 
buffer extend to a very wide variety of pipeBned algorithms* 

Tbm p c efe rred sulMystem fbr double buffering uses a duel port m em or y , 
pertiHoced in software so that the top half of the memory is snomted to one processor, 
25 and the bottom tmlf to the other. CIUs annrwtimi is switdied wbioi both processors set 
l es p e Oi re lk« beta imHrating that they are rewty to switch.) 

Oa aocessea to this mnnocy, additiottal bits tag the aoeees as "physical,'' logical,' 
or "preview/ A physical access is interpreted as a literal address within the ftill mmnoty, 
and the double buffering is ignored. A logical access is supplemented fay an additional 
30 address bit, determined by the double buffering switdi state. 

A pr e v i ew access is used for read access ctDfy, and goes to the i mwi il rr bank of. 
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memory from that wliidi would be aeeesaed in a logical aooesa. Tte use of preview accesa 
can be particularly advantageous in ovotding data flow inefBcienctea at synehronixatioQ 
points in pipelined algorithms. 

For eiample» if the standard double bulTering techniques were used in a flQTstem 
like that shown in Figure 1, it would be neceaaaiy to refill the data pip^ine after eveiy 
swap and empty it brfore. In this sample embodiment, a simple vector op»atlon reiiuire 
the aoating-point processor to do 8 caknilatinns for each buffw's worth of data. This 
means that three cycles of overhead are used, to GU and empty the pipeline, for every 
eight words of data. Obvious^, this adds a Ugh percentage on to the overaD average 
IMoessing time. 

One of the innovative teecfaingB set forth herein is that 'soft' double bu£^^ 
be used to o ver come this probtem. The preview mode (described above) aOows «ie port 
to preview the data in the other half before it is s w apyed . TUs later mode provides a 
means for the floating-point p roce ss or pipeline to be kept ftiQ ¥dwn the control proce ssor 
has finished its woric and is waiting to sw;^ butes before ftontimiing. 

Preferably double buCTering is used in a register file at the mterfaoe between a 
numeric processor and a large data cadie memory in a multiproeesaor ayston. The 
partitioning of the register file avoids data coBisfans in the cache memory 140. 

In this sample embodiment, a 5-ported register file 430 is used to i»»p»«*»*«* the 
m em oiy for the double buff^. However, a wide variety of other implementations could 
be used instead. 

This imovatioo provides much yeater fleidhfflty than conventional aywtcms whidi 
pel fo r m double bu£Eiering in hardware, at no toss in speed. 

In partkular, the 'prev i ew* mode permits this double-buffering impSemwitatioa 
to be used as a versatile interfine architecture in many pIpeHnM environments. 

Factors limiting Performance 

There are sis fundamental fhctors that can limit mayimiim performance. They 



are: 



The I/O bandwidth (whkh in the present^ preferred embodunent is 40 



Bfbytes per seoond)^ 
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Thb data cache memory bandwidth (which in the presently preferred 
embodhnents 320 Mbytes per seeood); 

The data tranete rate between the floating-point proeesaor module 130 
holding regbters and the register file. lUs la current^ lesa than the data cache memory 
6 bandwidth. 

Address ftalnibitkwi rate (which in the preeenUy preferred embodiments 
tyfkaSfy 10 milfioQ per secoauLbut this is very dependen t on the algorithm being run). 

The wtBtainfd floating point calniintinin rate. In the present^ preferred 
<^nhodlment t fbr a single precision *add* this is less than 28 as cyde time (and fike^ to 
10 improve as fester components become available), and for a single predaion muttip^ h is 
less than 42 ns cyde time. 

The number of numeric processing modules used in para&eL 

15 Tike fector whkfa determinea the perfermanoe for a particular algorithm 

very much on wdtich of the following conditiona app^ 

Where the source data and results are stored: This beat performance is 
ac hie ve d when the data is stored in the data cache memory. If the data is stored off- 
board, then it is very 8ke|y that the data I/O transfer rate wm be the Umitii«feetor.'n>e 

20 acfaievabie I/O rate will U8uai))r be determined by the per^hernb involved and the typt 
of tmnfera supported (single or block). An rate of 40 Btbytea p^ seoockd wiQ limit 
the ralntfatino rate to 8.3 Mflopa^ for a f Itnihitinn where three numbers are involved in 
every caleulatiogk. 

The ratio of data to arithmetie operationa. lUa determines whether the 
25 floatiDg point rnlmdatinn rate or the data transfer rate is the bottle-neck. Algorithms 
which requira relative fittle data for the amount of calculatioos (e.g. FFTa) win be 
limited by the floating-point p rocessor mod&de 130 s p e ed. An wmmpift of an algorithm 
that Is data transfer Gmited is vector add which r e quir e s 3 data values per arithmetie 

30 *Ibe Isyout of data in data eadie m em ory: The maximum transfer rata 

be tw ee n the data cadie m emor y and the floating-point p roce ssor module 130 b aoiy 
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acfaiervabla wiiea 8 contiguous F.wtvda (Le. floating-point words, of 32 faUs each) are 
transferred together. If the data for an algorithm cannot make use of this block transfer 
ability, then the net data transfer rate win drop. 
This la tabulated below: 
6 N^m^ Of F wojdi "^te 

8 80 MF.wwds per second 

4 40 
2 20 
1 10 
10 Most algoritfama can make use of the higher transfer ratea. Qn feet, even the FFT can 
make use of Ugber transfer rates» as discussed above.) 

Overlapped operatkms; This aBows off-board I/O transfers to occur in 
parallel to the floating point ralnilBtiona If the algorithms (or sequence oC algocithms) 
can use this fecility then the relathrt^ alow I/O transfer rate might not effect the ovwall 
16 calculatkMi rate. 

Multiple FPs: When an algorithm is ralnitotlnn bound and not limited by 
the memory or I/O bandwidth then multiple FPs can give a multiple of the single 
floatii^''potat procoBBor ni i^4ii lft 130 perfonttanoe» prDwiijBi^ the memory bandwidth is not 
eaeeeded. For example* with 4 FPs there is no increase in the vector add 
20 but an ITT is mkiilated 4 times fester. 

As win be ap pre ciat ed by thoee akiBed in the art, the innpvations disdooedl 
can bo applied in a wide varle^r of conteats, and are sul^jeet to a wide range of 
moififlRatiop and variatloii» 'Rwreforeii the ftifl scope of riaimed p atent protection is no4 
defined by any of the sami^ emboAnents set ferth herein, nor by any statements made 
25 herein eonosming those embodiments^ but is dsfined soletsr by the claima appended 
hereto. 
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WHAT IS nj^TMwn is: 

1. A computer system* eomiMriaiiig; 

flni and neoad pneemaK% eooiiected to e«ecute independent respective 
instruetion streams separate^ 

a dock gwiMirBtia, connected to monitor tlie instruction streams of both said 
6 first and second p r o c eesoa r a ^ and aocordfaiglbf ^ generate a stared ckxk ai^Ml which 

is connected to dock both said first and second prooessora* and 
has a variable duratkm wfakfa is eq:ual to the peater ofi tise 
duratioa required by said first processor for the instniction then being executed thereby, 
and the duraticm required by said aeeond pr o cjca sor for the instroetion then being executed 
10 thereby. 

2. The >ystem at Gbfaa 1, wherein said dodt generator monitoffs both of the 
respective streams of ittstruetiona beigg eaecuted by said first and secood proceaeorsi 

3. A ooniMiter flystenip comprising^ 

an eaatanial iotartae eaoMta; edonectable to an eztenal faiter&ee bus; 

a date transAr prnreasnr, nHdcfa is operabie eoncuRent^ with said control 
5 p ro cea soc to eaecote a separata re^ ie ctiw stream of faiatructiops, and ia connected to 

a oBBariD pr eoeawff » which is operable eoDcurrent^ and aiyudirua oua<y 
with aaki control p f o ce sso r and data tranalbr proccasor, and which executea a respective 
faiaferuetioii sequence under the overafl control of ssU numaric yiocuisut, and 
10 a shared ded^ conne ct ed to dock sdd contrd pro ces sor and said data 

tranate pf o e easBr » said dodt handng a variable dur^lon which is dependent on the 
Im t wrt i nn a bdag esecnted by said contrd processor and also on the faistruetkn bdng 
executed by said data tranafer prone asor 
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4. Tha fliyatem of CUm 3, wimtki aakt alural dock Is provided by a dock 
genmtcr wfakh moidfeora both aaid ropective stream of inslni^^ 
data transfer pr o ces sor sad also the rei p etU t^ stream of instmetlooa beliiff '^»^\^vi by 
said control procesaov* 



fi.'nie^yfltemorcaaim 9, wherein sak! shared dock haa a vaHabto duratloo which 
ia eqpial to tha greater oft the duration required by laid fir^ ^^tiemmim fh> thm h^mtmy^^ 
than beteg eseeutod th e reby, and the dimtkm requM I9 aafal first proeessor for the 
fautruetkm thsn being eaeeuted thereliy. 
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